<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[min{power}]]></title><description><![CDATA[Explorations in computing and robotics focused on power-efficiency and safety -- personal posts by Avik De, robotics Ph.D. and founder]]></description><link>https://www.avikde.me</link><image><url>https://substackcdn.com/image/fetch/$s_!Z7FY!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png</url><title>min{power}</title><link>https://www.avikde.me</link></image><generator>Substack</generator><lastBuildDate>Mon, 22 Jun 2026 17:49:55 GMT</lastBuildDate><atom:link href="https://www.avikde.me/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Avik De]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[minpower@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[minpower@substack.com]]></itunes:email><itunes:name><![CDATA[Avik De]]></itunes:name></itunes:owner><itunes:author><![CDATA[Avik De]]></itunes:author><googleplay:owner><![CDATA[minpower@substack.com]]></googleplay:owner><googleplay:email><![CDATA[minpower@substack.com]]></googleplay:email><googleplay:author><![CDATA[Avik De]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[What an Alternate History of the RISC vs. CISC Debate Teaches Us About Robotics]]></title><description><![CDATA[Explorations in computing and robotics focused on power-efficiency and safety -- personal posts by Avik De, robotics Ph.D. and founder]]></description><link>https://www.avikde.me/p/what-an-alternate-history-of-the</link><guid isPermaLink="false">https://www.avikde.me/p/what-an-alternate-history-of-the</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Thu, 18 Jun 2026 14:33:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IAZS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2857f9-887c-4e16-82d2-3b19b74eaef9_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">Recently, Elon Musk </span><a href="https://x.com/elonmusk/status/2021745508277268824"><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">staked a claim</span></a><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);"> that LLMs would eliminate the need for high-level computer programming languages, since they could compile directly to machine code.</span></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zdRc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726eb415-21bd-4b47-9dae-1423a50159ba_1044x392.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zdRc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726eb415-21bd-4b47-9dae-1423a50159ba_1044x392.png 424w, https://substackcdn.com/image/fetch/$s_!zdRc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726eb415-21bd-4b47-9dae-1423a50159ba_1044x392.png 848w, https://substackcdn.com/image/fetch/$s_!zdRc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726eb415-21bd-4b47-9dae-1423a50159ba_1044x392.png 1272w, https://substackcdn.com/image/fetch/$s_!zdRc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726eb415-21bd-4b47-9dae-1423a50159ba_1044x392.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zdRc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726eb415-21bd-4b47-9dae-1423a50159ba_1044x392.png" width="1044" height="392" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/726eb415-21bd-4b47-9dae-1423a50159ba_1044x392.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:392,&quot;width&quot;:1044,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:86592,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/200128061?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726eb415-21bd-4b47-9dae-1423a50159ba_1044x392.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zdRc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726eb415-21bd-4b47-9dae-1423a50159ba_1044x392.png 424w, https://substackcdn.com/image/fetch/$s_!zdRc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726eb415-21bd-4b47-9dae-1423a50159ba_1044x392.png 848w, https://substackcdn.com/image/fetch/$s_!zdRc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726eb415-21bd-4b47-9dae-1423a50159ba_1044x392.png 1272w, https://substackcdn.com/image/fetch/$s_!zdRc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726eb415-21bd-4b47-9dae-1423a50159ba_1044x392.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">Effectively, underlying that claim is a comparison between two pathways:</span></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IAZS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2857f9-887c-4e16-82d2-3b19b74eaef9_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IAZS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2857f9-887c-4e16-82d2-3b19b74eaef9_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!IAZS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2857f9-887c-4e16-82d2-3b19b74eaef9_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!IAZS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2857f9-887c-4e16-82d2-3b19b74eaef9_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!IAZS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2857f9-887c-4e16-82d2-3b19b74eaef9_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IAZS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2857f9-887c-4e16-82d2-3b19b74eaef9_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c2857f9-887c-4e16-82d2-3b19b74eaef9_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IAZS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2857f9-887c-4e16-82d2-3b19b74eaef9_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!IAZS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2857f9-887c-4e16-82d2-3b19b74eaef9_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!IAZS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2857f9-887c-4e16-82d2-3b19b74eaef9_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!IAZS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2857f9-887c-4e16-82d2-3b19b74eaef9_1536x1024.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">You&#8217;ll notice that the new pathway bypasses or short-circuits an important layer of abstraction in modern computing, the Instruction Set Architecture (ISA). Associated with that, it also eliminates the compiler. Importantly, it doesn&#8217;t eliminate instructions; ultimately, those need to be emitted either way in order to drive the hardware.</span></p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">The question of what characteristics the ISA should have spawned the great </span><a href="https://chipinsights.net/p/the-isa-debate">RISC vs. CISC debate</a><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);"> in computer engineering. We still use both RISC and CISC architectures, so it&#8217;s a subtle debate with merits on both sides. It also had some formative effects on the history of computer engineering, such as:</span></p><ul><li><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">The nature of the hardware ecosystem &#8212; how much it favors incumbents vs. new entrants</span></p></li><li><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">The nature of hardware innovation &#8212; how viable different hardware architectures are</span></p></li><li><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">The process by which software is interfaced to and optimized for hardware &#8212; compilers, toolchains, etc.</span></p></li></ul><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">The ISA debate itself is not the locus of innovation any longer, but it has had substantial influence on how desktop, mobile, and server computing have developed. That&#8217;s all history now.</span></p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">However, what if AI&#8217;s capabilities had arrived 30 years sooner, and Musk&#8217;s proposed scheme was the accepted software development paradigm? Would the RISC vs. CISC debate look the same?</span></p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">On a separate note, in my </span><a href="https://www.avikde.me/p/action-tokens-fine-grained-vs-behavioral">previous post</a><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);"> I discussed the importance of action tokens in robotics. Their granularity is an important design choice for robotic systems, and we have seen them range from very fine (joint commands) to very coarse (behavioral primitives). These are analogous to RISC and CISC instructions! However, importantly, robotics </span><a href="https://www.avikde.me/p/a-multi-robot-brain-is-not-like-a">does not have the notion of an ISA or a compiler</a><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">. The results of this alternate history of computing may be quite relevant to answering some of the big questions about how a &#8220;robotics ecosystem&#8221; shapes up over the coming years.</span></p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">In the following sections, we&#8217;ll go over:</span></p><ul><li><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">A very brief review of the key points in RISC vs. CISC debate, and how they have shaped computing&#8217;s landscape</span></p></li><li><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">What the landscape may have looked like without the ISA layer</span></p></li><li><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">What this means for robotics</span></p></li></ul><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption"><em>If you enjoy this post, please consider sharing and subscribing for free. I recently had one of my posts get <a href="https://substack.com/@avikde/note/c-277881308?r=5vzx85&amp;utm_source=notes-share-action&amp;utm_medium=web">published in IEEE Spectrum magazine</a>, and I&#8217;m incredibly grateful for the reader support that helped make that possible.</em></p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/what-an-alternate-history-of-the?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/what-an-alternate-history-of-the?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p><em>This article is co-written with <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Bharath Suresh&quot;,&quot;id&quot;:178190448,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23b7c14a-5bd1-4a78-9ac8-c5d6eda62bfc_2048x2048.jpeg&quot;,&quot;uuid&quot;:&quot;59d394db-472d-45c2-93a6-4fca886b5dbc&quot;}" data-component-name="MentionToDOM"></span>. Subscribe to <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Chip Insights&quot;,&quot;id&quot;:2850528,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/chipinsights&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74222e4c-9d04-46aa-82ba-7d82759b48b9_512x512.png&quot;,&quot;uuid&quot;:&quot;888ce16a-128d-4630-8332-3f5b2ecb5029&quot;}" data-component-name="MentionToDOM"></span> for narrative and insightful posts on computer architecture.</em></p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:2850528,&quot;name&quot;:&quot;Chip Insights&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z-fT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74222e4c-9d04-46aa-82ba-7d82759b48b9_512x512.png&quot;,&quot;base_url&quot;:&quot;https://chipinsights.net&quot;,&quot;hero_text&quot;:&quot;Semiconductor Industry Deep Dives&quot;,&quot;author_name&quot;:&quot;Bharath Suresh&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#020617&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://chipinsights.net?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!Z-fT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74222e4c-9d04-46aa-82ba-7d82759b48b9_512x512.png" width="56" height="56" style="background-color: rgb(2, 6, 23);"><span class="embedded-publication-name">Chip Insights</span><div class="embedded-publication-hero-text">Semiconductor Industry Deep Dives</div><div class="embedded-publication-author-name">By Bharath Suresh</div></a><form class="embedded-publication-subscribe" method="GET" action="https://chipinsights.net/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><h2><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">The RISC vs. CISC Debate and its Implications</span></h2><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">RISC instructions do very simple bits of work, which means many instructions are needed to solve a complex task. The compiler does the job of breaking up a complex task into these small steps. CISC instructions, on the other hand, can have the CPU do large bits of a task in a single dispatch. x86, a CISC architecture, has dominated desktop computing for several decades, while Arm, a RISC architecture, has dominated the mobile market. The RISC vs. CISC debate wouldn&#8217;t exist if there was a clear winner, which means that both sides have merits and tradeoffs. For curious readers, I&#8217;ve linked further resources below, but I wanted to highlight some of the broader ecosystem implications of one vs. the other:</span></p><p>The simplicity of the RISC instructions makes it easier to build hardware: the ISA is simpler to implement, and this leads to <strong>relatively more hardware entrants</strong>. This has been shown to be true in computing, with a <a href="https://thechipletter.substack.com/p/the-risc-wars-part-1-the-cambrian-c55">Cambrian explosion of RISC entrants</a>.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> Most computer organization and architecture courses build a RISC processor, and there are significantly more freely available resources that assist this process.</p><p>Because the RISC ecosystem is easier to enter, we tend to see <strong>more diversity in the microarchitecture</strong>. A good example is the <a href="https://chipsandcheese.com/p/condors-cuzco-risc-v-core-at-hot">Condor Cuzco RISC-V core</a>. Considering memory access time, different instructions require different amounts of latency in processors, and optimal performance requires scheduling instructions at precise times so that their operands are available. NVIDIA does this scheduling in their GPU compiler, but the RISC-V ISA leaves room for it to be implemented in hardware. Condor implements a &#8220;Time-Resource Matrix&#8221; solver for this problem, an innovative solution that they can pursue by <strong>relying on existing RISC-V infrastructure to build on</strong>.</p><p>The burden of <strong>software optimization</strong> is moved to the compiler by RISC. As <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Bharath Suresh&quot;,&quot;id&quot;:178190448,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23b7c14a-5bd1-4a78-9ac8-c5d6eda62bfc_2048x2048.jpeg&quot;,&quot;uuid&quot;:&quot;c853b1a6-3ecc-4d7e-b576-306116c90c10&quot;}" data-component-name="MentionToDOM"></span> said in his article, RISC is an apt acronym of &#8220;Relegate Important Stuff To Compiler&#8221;. The saving grace is that this compiler complexity is shared over many hardware targets, enabled by the ISA. It would be impractical to expect LLVM to be able to produce optimized code for a processor I invent, whereas I can easily piggyback on existing work if I target an established ISA.</p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">To summarize, RISC helps promote a more </span><strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">open, thriving ecosystem, encourages innovation and diversity in hardware, and spreads out the burden of optimization</span></strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);"> among different vertical parts of the stack (software, compiler, hardware).</span></p><h2><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">A Thought Experiment: no ISA or Compiler</span></h2><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">Underlying both sides of the ISA debate, computer engineering has a foundation with two critical pieces of technology: the </span><strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">ISA</span></strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);"> itself, and the </span><strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">compiler</span></strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">.</span></p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">Let&#8217;s imagine Musk&#8217;s proposed alternate world where these do not exist. To be clear, we are not advocating for this proposal (there is much </span><a href="https://engrlog.substack.com/p/why-skip-the-code-ship-the-binary">other writing</a><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);"> on this topic) other than using it for a thought experiment. Programs would be converted directly into machine code from high-level intent by an LLM (though the specifics of the method aren&#8217;t important for the point). In this world, we could still have RISC and CISC processors; the LLM would just have to know and understand how to map to the different instruction sets. Essentially, the LLM becomes a general purpose compiler across multiple ISAs.</span></p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">However, the ecosystem effects would not be the same, and in many ways would be inverted!</span></p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">Underlying the vibrant RISC hardware ecosystem was the hidden assumption that the compiler abstracts over implementation differences. If we delegate that to an LLM, RISC&#8217;s simplicity </span><em><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">becomes a liability</span></em><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">. Every microarchitectural optimization or quirk is now the LLM&#8217;s responsibility to learn separately, and CISC&#8217;s philosophy of &#8220;let the hardware handle complexity&#8221; becomes an advantage.</span></p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">It is still simpler to build RISC hardware, but without an ISA, the </span><strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">complexity of the full stack</span></strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);"> will fall entirely on the new hardware entrant. They cannot utilize work done by software or compiler authors, and will have to build or fine-tune a new LLM to target their hardware, with all the complexity that entails (datasets, benchmarks, training infrastructure, etc.). Practically, this makes it even more difficult than it already is for new hardware entrants.</span><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><p>The LLM now needs to embed knowledge of all the microarchitectural variations in hardware. For economic viability, hardware entrants will be incentivized to maintain LLM compatibility in novel releases, <strong>suppressing microarchitectural innovation</strong>.</p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">RISC vs. CISC also affects the </span><strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">complexity of the LLM</span></strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);"> that needs to be built to target the hardware. With CISC, fewer instructions are needed for the same program and the output space of the LLM is reduced. In fact, CISC was historically preferred for a similar reason: computer memory was slow and expensive.</span></p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">In some cases, CISC can also help abstract hardware differences away. For example, the </span><a href="https://thechipletter.substack.com/p/the-long-history-of-rep-movs"><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">REP MOVS</span></a><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);"> x86 instruction is a full translation of the intent (&#8220;move a string&#8221;), while in a load-store RISC architecture that does not support memory-memory operations, the LLM needs to perform many steps, and to keep track of which registers are free to implement the equivalent.</span></p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">Processors built on the CISC architectures have largely overcome their microarchitectural weaknesses by translating complex instructions into </span><a href="https://famberzbuilt.in/blog-details/micro-operations-ops-how-cpus-break-instructions-into-smaller-steps">micro-ops</a><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);"> to achieve similar hardware efficiencies as their RISC counterparts. In the proposed alternate world, RISC would also lose out on the other advantage it currently holds: easier performance optimization by the compiler.</span></p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">Putting this all together, we first realize that Musk&#8217;s proposal results in a significantly more consolidated ecosystem where </span><strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">innovation in hardware is more difficult across the board, and architectural diversity is stifled</span></strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">. A dark future indeed!</span></p><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">More unexpected is the realization that in this alternate future, </span><strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">RISC hardware becomes disproportionately </span></strong><em><strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">more difficult</span></strong></em><strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);"> to build</span></strong><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">. The ecosystem becomes more closed and concentrated, while CISC has a somewhat better chance of supporting different hardware with the same software.</span></p><h2><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">The Effects on Robotics</span></h2><p>We&#8217;ve <a href="https://www.avikde.me/p/the-first-paradigm-in-robotics-and">written before about</a> how computer engineering can anticipate aspects of technology development in the field of robotics. Could a RISC vs. CISC debate appear there, and have similar ecosystem effects?</p><p>Action tokens are analogous to instructions, and we have seen examples of <a href="https://www.avikde.me/p/action-tokens-fine-grained-vs-behavioral">fine- or coarse-grained tokens in robotics</a>. In a previous article I demonstrated that <a href="https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics">both are viable for a tabletop manipulation task</a>.</p><p>A Vision-Language-Action (VLA) model is an example of a &#8220;RISC&#8221; software layer that outputs fine-grained actions, and the outputs can be <strong>joint-level or Cartesian infinitesimal commands</strong>. The &#8220;CISC&#8221; software layer requires effectively a high-level task orchestrator outputting <strong>higher-level symbolic motion or task primitives</strong> (like &#8220;grasp object&#8221;). We have seen examples of this in classical AI dating back to <a href="https://en.wikipedia.org/wiki/Stanford_Research_Institute_Problem_Solver">STRIPS</a>, as well as in modern AI with VLMs like Gemini ER.</p><p>Since an <a href="https://www.avikde.me/p/a-multi-robot-brain-is-not-like-a">ISA or compiler equivalent does not exist today</a>, we end up in the &#8220;alternate history&#8221; version of the world from the previous section. What can we learn from it about the future of robotics?</p><p>Building &#8220;RISC&#8221; robots with only joint-level control functions remains a simpler task than also endowing them with higher-level task functions and autonomy. However, for the robots to be useful, those software components still need to be obtained from somewhere. Without the ISA layer, this effectively means that the software layer needs to have native support for the new hardware.</p><p>For <a href="https://www.avikde.me/p/debugging-as-architecture-insight">my VLA project</a>, I made my simulated robot arm choice based on what was supported by a VLA out-of-the-box, and I assume this is a common phenomenon. The result is <strong>consolidation and reduction of hardware diversity</strong>.</p><p>A <a href="https://arxiv.org/pdf/2512.12230">2024 study</a> on zero-shot fall recovery across seven humanoid morphologies found noted difficulty transferring to robots that are &#8220;top-heavy, long-armed, or otherwise distant in morphology space&#8221;. The CISC equivalent has been explored less, but this <a href="https://sites.google.com/berkeley.edu/morphology-transfer">2020 paper from Berkeley</a> demonstrates that using &#8220;CISC&#8221; subgoals with a morphology dependent low-level policy aids policy transfer to new morphologies.</p><p>Lastly, with the RISC perspective, the burden of optimization is pushed entirely on to either the AI model builder, or the hardware vendor (who must now fine-tune the large model to work with their hardware). Unlike the multi-layer software &#8594; compiler &#8594; hardware stack, where smaller players can share the burden to build competent products, there is no way to spread this workload around. This affects ecosystem investment and control, both commercially and <a href="https://www.avikde.me/p/the-first-paradigm-in-robotics-and">in academic research</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/what-an-alternate-history-of-the/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/what-an-alternate-history-of-the/comments"><span>Leave a comment</span></a></p><p></p><h2>Closing Thoughts</h2><p>I have always had a fond impression of how RISC influenced the computing ecosystem, and it turns out that the ISA layer was a crucial reason for that. Elon Musk&#8217;s claim made me think of how that hypothetical scenario could have changed history, and it almost completely inverts the balance.</p><p>Robotics is early to this game, doesn&#8217;t have an equivalent of an ISA yet, and ecosystems are just beginning to take shape. For now, there are (very smart!) engineers at different research labs and companies choosing &#8220;instruction&#8221; granularity for their own technical reasons. As well they should &#8212; building a robotic solution poses enough challenges as it is. However, so were computer engineers at Intel, and it&#8217;s difficult to imagine the long-reaching ecosystem consequences of these decisions!</p><p>Even though robotics doesn&#8217;t have an &#8220;ISA&#8221; yet, there are some promising directions and paradigms I&#8217;d like to explore in an upcoming article. Make sure to subscribe for that, and thanks for reading!</p><p><em>If you enjoyed this post, please like (&#10084;&#65039;) and restack &#8212; it helps others find my writing. Subscribe to receive new posts. All of this is greatly appreciated.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/what-an-alternate-history-of-the?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/what-an-alternate-history-of-the?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>Further Reading</h2><p>In this article I talked through an &#8220;action token&#8221; in robotics, an equivalent of an &#8220;instruction&#8221; in computing:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;6e0a3172-0155-4d80-882e-e2ad77ac417e&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Action Tokens: Fine-Grained vs. Behavioral Primitives&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-06-03T16:01:58.224Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!gpDc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/action-tokens-fine-grained-vs-behavioral&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:200467846,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:11,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Read about the RISC-spurned ecosystem in computing:</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:118332198,&quot;url&quot;:&quot;https://thechipletter.substack.com/p/the-risc-wars-part-1-the-cambrian-c55&quot;,&quot;publication_id&quot;:1063960,&quot;publication_name&quot;:&quot;The Chip Letter&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!vwjY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffe682d7-ab93-463b-b714-8f98c0c072d2_1280x1280.png&quot;,&quot;title&quot;:&quot;The RISC Wars Part 1 : The Cambrian Explosion&quot;,&quot;truncated_body_text&quot;:&quot;I&#8217;d like to start this week&#8217;s post with a small confession. In last week&#8217;s post, I called Berkeley RISC-I &#8216;the first RISC microprocessor&#8217;. In fact, some believe that a derivative of the IBM 801, known as ROMP, was the first RISC microprocessor. The story of the 801&#8217;s successors, including ROMP, will be the subject of a later post. I believe that RISC-I &#8230;&quot;,&quot;date&quot;:&quot;2023-04-30T18:14:59.079Z&quot;,&quot;like_count&quot;:29,&quot;comment_count&quot;:24,&quot;bylines&quot;:[{&quot;id&quot;:102722254,&quot;name&quot;:&quot;Babbage&quot;,&quot;handle&quot;:&quot;thechipletter&quot;,&quot;previous_name&quot;:&quot;The Chip Letter&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F82525b9c-ee3c-4996-916c-54267a4d354b_416x416.png&quot;,&quot;bio&quot;:&quot;Computer history and architecture&quot;,&quot;profile_set_up_at&quot;:&quot;2022-08-28T13:07:25.701Z&quot;,&quot;reader_installed_at&quot;:&quot;2022-10-20T11:45:48.505Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:1012118,&quot;user_id&quot;:102722254,&quot;publication_id&quot;:1063960,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:true,&quot;publication&quot;:{&quot;id&quot;:1063960,&quot;name&quot;:&quot;The Chip Letter&quot;,&quot;subdomain&quot;:&quot;thechipletter&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Computer history and architecture&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ffe682d7-ab93-463b-b714-8f98c0c072d2_1280x1280.png&quot;,&quot;author_id&quot;:102722254,&quot;primary_user_id&quot;:102722254,&quot;theme_var_background_pop&quot;:&quot;#FF6B00&quot;,&quot;created_at&quot;:&quot;2022-08-28T13:07:52.880Z&quot;,&quot;email_from_name&quot;:&quot;The Chip Letter&quot;,&quot;copyright&quot;:&quot;The Chip Letter&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;magaziney&quot;,&quot;is_personal_mode&quot;:false,&quot;logo_url_wide&quot;:null}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100,&quot;status&quot;:{&quot;bestsellerTier&quot;:100,&quot;subscriberTier&quot;:1,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:{&quot;type&quot;:&quot;bestseller&quot;,&quot;tier&quot;:100},&quot;subscriber&quot;:null}}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;,&quot;source&quot;:null}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://thechipletter.substack.com/p/the-risc-wars-part-1-the-cambrian-c55?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!vwjY!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffe682d7-ab93-463b-b714-8f98c0c072d2_1280x1280.png" loading="lazy"><span class="embedded-post-publication-name">The Chip Letter</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">The RISC Wars Part 1 : The Cambrian Explosion</div></div><div class="embedded-post-body">I&#8217;d like to start this week&#8217;s post with a small confession. In last week&#8217;s post, I called Berkeley RISC-I &#8216;the first RISC microprocessor&#8217;. In fact, some believe that a derivative of the IBM 801, known as ROMP, was the first RISC microprocessor. The story of the 801&#8217;s successors, including ROMP, will be the subject of a later post. I believe that RISC-I &#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">3 years ago &#183; 29 likes &#183; 24 comments &#183; Babbage</div></a></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>There is an orthogonal topic of ISA <em>licensing</em> which is unrelated to the RISC vs. CISC-ness, so we will skip over it for this article.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">There have been real examples of cautionary tales of hardware startups deviating too far from existing ISAs / toolchains and suffering &#8212; Graphcore is probably the </span><a href="https://sifted.eu/articles/graphcore-finances">cautionary tale</a><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">. We are part way through an </span><a href="https://chipinsights.net/p/mapping-algorithms-to-custom-silicon">article series</a><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);"> about how to approach this problem with the hands-on example of a matrix multiplication accelerator.</span></p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Action Tokens: Fine-Grained vs. Behavioral Primitives]]></title><description><![CDATA[Why the LLM tokenization debate matters for physical AI, and what biology tells us]]></description><link>https://www.avikde.me/p/action-tokens-fine-grained-vs-behavioral</link><guid isPermaLink="false">https://www.avikde.me/p/action-tokens-fine-grained-vs-behavioral</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Wed, 03 Jun 2026 16:01:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!gpDc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When Claude Opus 4.7 got released, it got some backlash from people who noticed that it was <a href="https://www.reddit.com/r/ClaudeAI/comments/1sn6eud/opus_47_consumes_more_tokens_due_to_the_new/">using up tokens faster for their tasks</a>. The cause of this was a new tokenizer that would use more tokens for the same query or response.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Uzom!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a53b85a-bb25-46c6-a8cf-f423a7f1276d_1398x930.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Uzom!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a53b85a-bb25-46c6-a8cf-f423a7f1276d_1398x930.png 424w, https://substackcdn.com/image/fetch/$s_!Uzom!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a53b85a-bb25-46c6-a8cf-f423a7f1276d_1398x930.png 848w, https://substackcdn.com/image/fetch/$s_!Uzom!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a53b85a-bb25-46c6-a8cf-f423a7f1276d_1398x930.png 1272w, https://substackcdn.com/image/fetch/$s_!Uzom!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a53b85a-bb25-46c6-a8cf-f423a7f1276d_1398x930.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Uzom!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a53b85a-bb25-46c6-a8cf-f423a7f1276d_1398x930.png" width="1398" height="930" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0a53b85a-bb25-46c6-a8cf-f423a7f1276d_1398x930.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:930,&quot;width&quot;:1398,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:467998,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/200467846?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a53b85a-bb25-46c6-a8cf-f423a7f1276d_1398x930.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Uzom!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a53b85a-bb25-46c6-a8cf-f423a7f1276d_1398x930.png 424w, https://substackcdn.com/image/fetch/$s_!Uzom!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a53b85a-bb25-46c6-a8cf-f423a7f1276d_1398x930.png 848w, https://substackcdn.com/image/fetch/$s_!Uzom!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a53b85a-bb25-46c6-a8cf-f423a7f1276d_1398x930.png 1272w, https://substackcdn.com/image/fetch/$s_!Uzom!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a53b85a-bb25-46c6-a8cf-f423a7f1276d_1398x930.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Users were similarly upset when <a href="https://www.reddit.com/r/ClaudeCode/comments/1tsjlg0/opus_48_is_a_killer/">Opus 4.8 would use up tokens for reasoning</a>, costing $$ while not producing any actionable output:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yWJJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade7f717-5045-4914-94c6-507ab3313020_1382x806.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yWJJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade7f717-5045-4914-94c6-507ab3313020_1382x806.png 424w, https://substackcdn.com/image/fetch/$s_!yWJJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade7f717-5045-4914-94c6-507ab3313020_1382x806.png 848w, https://substackcdn.com/image/fetch/$s_!yWJJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade7f717-5045-4914-94c6-507ab3313020_1382x806.png 1272w, https://substackcdn.com/image/fetch/$s_!yWJJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade7f717-5045-4914-94c6-507ab3313020_1382x806.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yWJJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade7f717-5045-4914-94c6-507ab3313020_1382x806.png" width="1382" height="806" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ade7f717-5045-4914-94c6-507ab3313020_1382x806.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:806,&quot;width&quot;:1382,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:201858,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/200467846?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade7f717-5045-4914-94c6-507ab3313020_1382x806.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yWJJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade7f717-5045-4914-94c6-507ab3313020_1382x806.png 424w, https://substackcdn.com/image/fetch/$s_!yWJJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade7f717-5045-4914-94c6-507ab3313020_1382x806.png 848w, https://substackcdn.com/image/fetch/$s_!yWJJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade7f717-5045-4914-94c6-507ab3313020_1382x806.png 1272w, https://substackcdn.com/image/fetch/$s_!yWJJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade7f717-5045-4914-94c6-507ab3313020_1382x806.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>More broadly, as LLMs continue to gain use, the answers to many questions seem to return to tokens. <strong>Q</strong>: How responsive is an LLM? <strong>A</strong>: tokens/second. <strong>Q</strong>: How much do you spend on your LLM? <strong>A</strong>: $/token.</p><p>In this article, we will go over:</p><ul><li><p>What tokens and tokenizers are in LLMs, and what the fundamental tradeoffs are (in algorithm characteristics, computation, and memory) that tokens induce.</p></li><li><p>How this tradeoff is going to be even more central in robotics or physical AI, and where biology lies on the spectrum</p></li></ul><p>There is also a very interesting parallel between an action <em>token</em> in robotics to an <em>instruction</em> in computing, and the juxtaposition it creates with the <a href="https://chipinsights.net/p/the-isa-debate">RISC vs. CISC debate</a>. I&#8217;ll cover that in a follow-on post that I&#8217;m excited to write &#8212; make sure to subscribe to get it in your app or inbox:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><p></p><h2>Tokens in LLMs</h2><p>In LLMs, tokens are a numerical encoding of language. I&#8217;ll describe the tokenization process briefly here for the purposes of this article, with references to a fuller analysis in the <a href="https://mlsysbook.ai/tinytorch/modules/10_tokenization_ABOUT.html">TinyTorch course notes</a>.</p><p>One idea is to make a word a token. That&#8217;s more or less how humans learn a language: a dictionary will have entries corresponding to words.</p><blockquote><p>Neural networks process numbers, not text. When you pass the string &#8220;Hello&#8221; to a model, it must first become a sequence of integers. This transformation happens in four steps: <mark data-color="#ffff00" style="background-color: rgb(255, 255, 0); color: rgb(0, 0, 0);">split text into tokens</mark> (units of meaning), build a vocabulary mapping each unique token to an integer ID, encode text by looking up each token&#8217;s ID, and enable decoding to reconstruct the original text from IDs.</p></blockquote><p>The tokenization process decides how the &#8220;split&#8221; step happens, and this can be done in various ways.</p><blockquote><p>The simplest approach treats each character as a token. Consider the word &#8220;hello&#8221;: split into characters <code>['h', 'e', 'l', 'l', 'o']</code>, build a vocabulary with IDs <code>{'h': 1, 'e': 2, 'l': 3, 'o': 4}</code>, encode to <code>[1, 2, 3, 3, 4]</code>, and decode back by reversing the lookup.</p></blockquote><p>Even at this point, some of the tradeoffs are becoming clear. This mapping (which is used in the forward direction for encoding, and in reverse direction for decoding) needs to somehow be stored in <em>memory</em>:</p><blockquote><p>Character vocabularies are tiny (typically 50-200 tokens depending on language), which means small embedding tables. A 100-character vocabulary with 512-dimensional embeddings requires only 51,200 parameters, about 200 KB of memory. This is dramatically smaller than word-level vocabularies with 100,000+ entries.</p></blockquote><p>However, a different problem arises when it comes time to actually process the encoded sequence. This processing is algorithm-dependent (more on this later). When an LLM receives a query, it will pass through a transformer block that receives the token sequence. The length of this sequence has a huge impact on the number of <a href="https://www.viksnewsletter.com/p/a-primer-on-transformer-architecture">computations in the transformer</a>.</p><blockquote><p>Character tokenization has a fatal flaw for neural networks: sequences are too long. A 50-word sentence might produce 250 character tokens. Processing 250 tokens through self-attention layers is computationally expensive, and the model must learn to compose characters into words from scratch.</p></blockquote><p>So, small vocabularies mean low <strong>memory</strong> usage for the dictionary, but long sequences for <strong>computation</strong>. Large vocabularies (like full words) have high memory usage, but produce shorter sequences.</p><p>The last sentence in the quote above brings another side of the argument into the picture: how <strong>generative</strong> are the tokens? Stringing words together to make sensible sentences is easier than string letters together! A letter by itself doesn&#8217;t &#8220;mean&#8221; much, but a word carries a lot more information.</p><p>There is a spectrum between these options, and all LLMs today use a &#8220;subword&#8221; as a token with a process called byte pair encoding (BPE). For LLMs, BPE is learned from data: commonly used character sequences like &#8220;un-&#8221; or &#8220;pre-&#8221; will become tokens. Importantly, for LLMs, this type of encoding is easily <em>learned from data</em>. If you have an LLM-sized training corpus, you can just check what character sequences make sense as tokens over <em>all of it</em>.</p><blockquote><p>BPE solves this by learning subword units. The algorithm is elegant: start with a character-level vocabulary, count all adjacent character pairs in the corpus, merge the most frequent pair into a new token, and repeat until reaching the target vocabulary size.</p></blockquote><p>Before we move on, let&#8217;s summarize some of the tradeoffs tokenization induces, in a spectrum between <em>fine</em> (like letters) and <em>coarse</em> (like words) tokens:</p><ol><li><p><strong>Memory</strong>: Embedding table size better with fine tokens</p></li><li><p><strong>Computation</strong>: Sequence length shorter with coarse tokens, but depends on algorithm</p></li><li><p><strong>Generation</strong>: Easier with coarse tokens</p></li><li><p><strong>Transfer</strong>: Words might be more task specific (e.g. a medical journal may have different types of words than a novel), but capture reusable task structure better (e.g. you can use an English to Italian dictionary to get by in Italy, vs. learning Italian from scratch from letters)</p></li><li><p>Execution latency and <strong>architecture flexibility</strong>: Fine tokens require very fast inference loops, but coarse tokens relax that and give more flexibility.</p></li></ol><h2>Action Tokens in Robotics</h2><p>Robotics does physical work in the real world, and the most important part of that is the action. In this post, we will focus on action tokens instead of tokens for sensing or task input.</p><p>By analogy, a token is a small component of a larger task-directed behavior. If the task is to pick up a block, should a token be to move the joints by a few degrees, a Cartesian waypoint, or to complete the full grasp?</p><p>In a previous post, I wrote about two ends of this spectrum for exactly a task like this:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;ac987ea1-7eb8-4b49-bc72-05aed833202c&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;A coding agent equivalent for robotics pipelines&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-26T18:18:42.566Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!IGFD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee72e6-b019-4210-ab60-5d852f7b3f90_640x480.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:192049893,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:14,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>You can try the <a href="https://avikde.github.io/vla-pipeline/">demo here</a> too.</p><p>Basically all VLA models output fine-grained tokens, and the alternative approach used a VLM outputting coarse &#8220;move&#8221;, &#8220;grasp&#8221;, etc. commands.</p><p>Unlike the LLM &#8220;chatbot&#8221; interaction model where encoding and decoding are both part of standard usage, robot actions strictly only need to be <em>decoded</em> at inference time. If we are learning from imitation or demonstration, then an encoding step would be a part of training.</p><p>The tradeoffs from above can be ported over to make sense for robotics actions too, with some interesting implications:</p><ol><li><p><strong>Memory</strong>: This corresponds to the size of the &#8220;library&#8221; of action primitives that the robot is capable of executing. You don&#8217;t need to really store anything if the primitives are small joint motions, but memory demand could balloon if the primitives are more complex. For example, a robot arm may have hundreds or thousands of preset configurations, reaching motions, etc.</p></li><li><p><strong>Computation</strong>: As exemplified by the robotics pipeline demo above, the Gemini ER model has a computationally simpler problem producing sequences of commands.</p></li><li><p><strong>Generation</strong>: Easier with coarse tokens, as above</p></li><li><p><strong>Transfer</strong>: Motion primitives might be more task specific (e.g. a pick-and-place vocabulary would not have &#8220;catch&#8221; and &#8220;toss&#8221; actions for juggling), but capture reusable task structure better (e.g. the primitives in the demo above are useful in for package logistics as well as loading a dishwasher)</p></li><li><p>Execution latency and <strong>architecture flexibility</strong>: The coarse tokens in the demo allowed us to combine hierarchically with model-based planners, whereas the VLA needs to run end-to-end</p></li></ol><h2>Behavioral Primitives in Robotics and Biology</h2><p>Coarse action tokens are sometimes referred to as <strong>behavioral primitives</strong>.</p><p>The parallel to language (building up sentences from subwords and words) has been quite literally expressed in some robotics research (as a disclaimer, I was a Ph.D. student in the same lab):</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rNuT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7631af9a-1591-4163-9c08-d459e6621e42_640x212.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rNuT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7631af9a-1591-4163-9c08-d459e6621e42_640x212.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rNuT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7631af9a-1591-4163-9c08-d459e6621e42_640x212.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rNuT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7631af9a-1591-4163-9c08-d459e6621e42_640x212.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rNuT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7631af9a-1591-4163-9c08-d459e6621e42_640x212.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rNuT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7631af9a-1591-4163-9c08-d459e6621e42_640x212.jpeg" width="640" height="212" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7631af9a-1591-4163-9c08-d459e6621e42_640x212.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:212,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:39742,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/200467846?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7631af9a-1591-4163-9c08-d459e6621e42_640x212.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rNuT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7631af9a-1591-4163-9c08-d459e6621e42_640x212.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rNuT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7631af9a-1591-4163-9c08-d459e6621e42_640x212.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rNuT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7631af9a-1591-4163-9c08-d459e6621e42_640x212.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rNuT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7631af9a-1591-4163-9c08-d459e6621e42_640x212.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">A six-legged robot generating leaping behaviors by stringing &#8220;word&#8221; primitives into &#8220;sentences&#8221; (<a href="https://ieeexplore.ieee.org/document/6630928">paper</a>)</figcaption></figure></div><p>In robotic behavior cloning, object-level and action-level abstractions can be learned from training data (the BPE equivalent):</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8g1V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b7da7-6011-4571-8b94-e0bd84094b54_1522x211.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8g1V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b7da7-6011-4571-8b94-e0bd84094b54_1522x211.png 424w, https://substackcdn.com/image/fetch/$s_!8g1V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b7da7-6011-4571-8b94-e0bd84094b54_1522x211.png 848w, https://substackcdn.com/image/fetch/$s_!8g1V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b7da7-6011-4571-8b94-e0bd84094b54_1522x211.png 1272w, https://substackcdn.com/image/fetch/$s_!8g1V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b7da7-6011-4571-8b94-e0bd84094b54_1522x211.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8g1V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b7da7-6011-4571-8b94-e0bd84094b54_1522x211.png" width="1456" height="202" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/948b7da7-6011-4571-8b94-e0bd84094b54_1522x211.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:202,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Refer to caption&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Refer to caption" title="Refer to caption" srcset="https://substackcdn.com/image/fetch/$s_!8g1V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b7da7-6011-4571-8b94-e0bd84094b54_1522x211.png 424w, https://substackcdn.com/image/fetch/$s_!8g1V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b7da7-6011-4571-8b94-e0bd84094b54_1522x211.png 848w, https://substackcdn.com/image/fetch/$s_!8g1V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b7da7-6011-4571-8b94-e0bd84094b54_1522x211.png 1272w, https://substackcdn.com/image/fetch/$s_!8g1V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b7da7-6011-4571-8b94-e0bd84094b54_1522x211.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Finding primitives from training data (<a href="https://arxiv.org/html/2405.03864v1">paper</a>)</figcaption></figure></div><p><a href="https://www.nature.com/articles/s41598-024-82472-x">This paper</a> shows the emergence of synergies from reinforcement learning of humanoid locomotion.</p><p>In biology, motor synergies coordinated in the spinal cord allow multiple joints to be coordinated to produce coarse tokens:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TiZm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4830d959-2d7f-45eb-93ff-116a2a2eaba2_3591x1998.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TiZm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4830d959-2d7f-45eb-93ff-116a2a2eaba2_3591x1998.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TiZm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4830d959-2d7f-45eb-93ff-116a2a2eaba2_3591x1998.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TiZm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4830d959-2d7f-45eb-93ff-116a2a2eaba2_3591x1998.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TiZm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4830d959-2d7f-45eb-93ff-116a2a2eaba2_3591x1998.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TiZm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4830d959-2d7f-45eb-93ff-116a2a2eaba2_3591x1998.jpeg" width="1456" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4830d959-2d7f-45eb-93ff-116a2a2eaba2_3591x1998.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:704974,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/200467846?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4830d959-2d7f-45eb-93ff-116a2a2eaba2_3591x1998.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TiZm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4830d959-2d7f-45eb-93ff-116a2a2eaba2_3591x1998.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TiZm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4830d959-2d7f-45eb-93ff-116a2a2eaba2_3591x1998.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TiZm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4830d959-2d7f-45eb-93ff-116a2a2eaba2_3591x1998.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TiZm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4830d959-2d7f-45eb-93ff-116a2a2eaba2_3591x1998.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Synergies in biology (<a href="https://www.science.org/doi/10.1126/scirobotics.ado9509">Link to paper</a>)</figcaption></figure></div><p>The resulting behavioral primitives can be recognizable subcomponents of tasks:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gpDc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gpDc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png 424w, https://substackcdn.com/image/fetch/$s_!gpDc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png 848w, https://substackcdn.com/image/fetch/$s_!gpDc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png 1272w, https://substackcdn.com/image/fetch/$s_!gpDc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gpDc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png" width="1456" height="1072" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1072,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2423232,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/200467846?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gpDc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png 424w, https://substackcdn.com/image/fetch/$s_!gpDc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png 848w, https://substackcdn.com/image/fetch/$s_!gpDc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png 1272w, https://substackcdn.com/image/fetch/$s_!gpDc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d65085-e049-4bda-84f8-fb9eed77a16e_2068x1522.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Synergistic coordination results in task-directed primitives (<a href="https://www.jneurosci.org/content/41/32/6878">paper</a>)</figcaption></figure></div><p></p><h2>Closing Thoughts</h2><p>A token in an LLM or a robot can be either very fine-grained or coarse. Due to the rich tradeoff space, there isn&#8217;t an absolute correct answer for what to choose, and the best decision may change over time and for different applications (as Anthropic found with Opus 4.7).</p><p>Not enough attention has been paid yet to the implications of this choice for robotics. In addition to the technical tradeoffs I discussed above, it can have some very important downstream effects on the ecosystem of research: fine-grained tokens make the robotic stack monolithic, while coarse ones allow for parallel progress in low-level control and higher-level cognition that can be <a href="https://www.avikde.me/p/a-multi-robot-brain-is-not-like-a">vertically integrated for a particular application</a>. This also has an impact on how the <a href="https://www.avikde.me/p/the-first-paradigm-in-robotics-and">research ecosystem can be structured</a>.</p><p>The course of <a href="https://www.avikde.me/p/the-ai-world-models-debate-and-its">world model research</a> and <a href="https://itcanthink.substack.com/p/a-week-of-scaling-announcements-in">continued robotic deployments</a> in 2026 may shed some more light on the best choice for different applications. I&#8217;ll look forward to covering major developments in this area in future posts. Also, stay tuned for a follow-up on the parallel to instructions in computing and the RISC vs. CISC debate.</p><p>Thanks for reading! </p><p><em>If you enjoyed this post, please like (&#10084;&#65039;) and restack &#8212; it helps others find my writing. Subscribe to receive new posts. All of this is greatly appreciated.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/action-tokens-fine-grained-vs-behavioral?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/action-tokens-fine-grained-vs-behavioral?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[The Loops and Hierarchies of Embodied Intelligence]]></title><description><![CDATA[Can we get embodied intelligence by connecting cameras and motors to an AI brain?]]></description><link>https://www.avikde.me/p/the-loops-and-hierarchies-of-embodied</link><guid isPermaLink="false">https://www.avikde.me/p/the-loops-and-hierarchies-of-embodied</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Wed, 27 May 2026 12:51:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/WOPED7I5Lac" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The terms &#8220;embodied intelligence&#8221;, &#8220;physical intelligence&#8221;, or &#8220;physical AI&#8221; are appearing in press releases and technical articles very often these days, but they may be referring to different things.</p><p>On one end of the spectrum, organizations are building AI brains using foundation models (principally <a href="https://itcanthink.substack.com/p/vision-language-action-models-and">VLAs</a> at the moment), and deploying them to humanoid robots or robot arms for some tasks. Taking a term the public understands (<em>artificial intelligence</em>), and extending it to physical work via robotics, is being referred to as <em>embodied</em> or <em>physical</em> intelligence. For example, <a href="https://www.pi.website/">Physical Intelligence</a> is the literal name of one of the forerunning organizations, <a href="https://merics.org/en/report/embodied-ai-chinas-ambitious-path-transform-its-robotics-industry">Embodied AI</a> is a labeled a priority in the CCP&#8217;s 15th Five-Year Plan for China&#8217;s socioeconomic development, etc.</p><p>On a different end of the spectrum, <a href="https://www.darpa.mil/news/2026/rethinking-robotics">DARPA sent out a call</a> (submissions due on the date this article is published!) to researchers requesting information about <em>physical intelligence</em>, referring to intelligent materials that implement sensing and actuation without needing any brain at all.</p><p>These apparently contradictory viewpoints are just the tips of the iceberg of a rich corpus of literature about embodied intelligence. In this article, we&#8217;ll review a bit of that history and then see if it can help us build better robots today.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><em>This post may be truncated in email form &#8212; click on &#8220;View entire message&#8221; to view it in a browser.</em></p><h2>A Brief Review of Embodied Intelligence</h2><p>While approaches differ, it&#8217;s an uncontroversial opinion that having a physical instantiation (a body) is <em>helpful</em> to developing intelligence. Even the most practically-motivated <a href="https://www.nature.com/articles/d42473-026-00119-z">robotics companies are drawing inspiration</a> from comparative psychology:</p><blockquote><p>A classic <a href="https://psycnet.apa.org/doiLanding?doi=10.1037%2Fh0040546">experiment in 1963 </a>was conducted to investigate how vision and motor activity are linked in perceptual development. In the setup, kittens were placed in a carousel &#8212; some walking freely, others carried in harnesses. Though all of them saw the same scenes, only the active kittens developed normal depth perception. The study showed that motor activity plays a decisive role in visually guided perception and motor learning.</p><p>Robots with embodied intelligence learn in much the same way. When they move, probe and act, sensory input can be connected to the consequences of their own behaviour. This feedback allows them to build more accurate internal models.</p></blockquote><p>This says that at least the <em>development</em> of intelligence requires having a body. Technically, after the &#8220;accurate internal models&#8221; are built, maybe you have a fully developed AI brain that can then function without that body. There are stronger views that the body can never be separated from the brain, that it is a <em>constitutive</em> part of intelligence.</p><p>I&#8217;ll go over some of the viewpoints along this spectrum below. If I missed an important reference, please comment below &#8212; it would help me as well as other readers!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/the-loops-and-hierarchies-of-embodied/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/the-loops-and-hierarchies-of-embodied/comments"><span>Leave a comment</span></a></p><h3>Gibson&#8217;s Affordances (1979)</h3><p>Gibson&#8217;s <a href="https://www.taylorfrancis.com/books/mono/10.4324/9781315740218/ecological-approach-visual-perception-james-gibson">seminal book</a> claims that perception isn&#8217;t a passive process, but of action possibilities (&#8220;affordances&#8221;) relative to a particular body.</p><blockquote><p>The basic assumption is that vision depends on the eye which is connected to the brain. The author suggests that natural vision depends on the eyes in the head on a body supported by the ground, the brain being only the central organ of a complete visual system. When no constraints are put on the visual system, people look around, walk up to something interesting and move around it so as to see it from all sides, and go from one vista to another.</p></blockquote><p>As an example, a step affords climbing for a human leg but not for a mouse. The &#8220;climbability&#8221; is a coupled property of the body and the environment. Gibson&#8217;s view goes further that the animal perceives the affordance directly, immediately and without reconciliation with an internal model in the brain.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ntlt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca97842e-6b2d-49b4-b2f0-2189ad71daed_612x246.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ntlt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca97842e-6b2d-49b4-b2f0-2189ad71daed_612x246.png 424w, https://substackcdn.com/image/fetch/$s_!ntlt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca97842e-6b2d-49b4-b2f0-2189ad71daed_612x246.png 848w, https://substackcdn.com/image/fetch/$s_!ntlt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca97842e-6b2d-49b4-b2f0-2189ad71daed_612x246.png 1272w, https://substackcdn.com/image/fetch/$s_!ntlt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca97842e-6b2d-49b4-b2f0-2189ad71daed_612x246.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ntlt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca97842e-6b2d-49b4-b2f0-2189ad71daed_612x246.png" width="612" height="246" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ca97842e-6b2d-49b4-b2f0-2189ad71daed_612x246.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:246,&quot;width&quot;:612,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8145,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/199171451?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca97842e-6b2d-49b4-b2f0-2189ad71daed_612x246.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ntlt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca97842e-6b2d-49b4-b2f0-2189ad71daed_612x246.png 424w, https://substackcdn.com/image/fetch/$s_!ntlt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca97842e-6b2d-49b4-b2f0-2189ad71daed_612x246.png 848w, https://substackcdn.com/image/fetch/$s_!ntlt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca97842e-6b2d-49b4-b2f0-2189ad71daed_612x246.png 1272w, https://substackcdn.com/image/fetch/$s_!ntlt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca97842e-6b2d-49b4-b2f0-2189ad71daed_612x246.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gibsonian affordances for vision &#8212; there is no brain in this picture. (<a href="https://www.semanticscholar.org/paper/Affordances%3A-Clarifying-and-Evolving-a-Concep-McGrenere-Ho/bfdd88499bd2b7971ce2ef976e3601efaa320297">source</a>)</figcaption></figure></div><p>The takeaway here is that affordances inseparably tie any notion of intelligent perception to the actual sensor itself. You can&#8217;t move a &#8220;perception skill&#8221; from one embodiment to another because that changes the affordance structure.</p><h3>Brooks&#8217; Subsumption Architecture (1987)</h3><p>Rodney Brooks&#8217; 1987 paper &#8220;<a href="https://people.csail.mit.edu/brooks/papers/representation.pdf">Intelligence Without Representation</a>&#8221; showcases a couple of interesting concepts.</p><p>Conventional AI at the time utilized a centralized <em>Sense &#8594; Plan &#8594; Act</em> pipeline, a hierarchy of components performing separate functions. Instead, Brooks suggests a hierarchy by objective, where each component does all the sensing, planning, and acting it needs. For example, a component could be <code>avoid_collision</code>, and a second component could be <code>move_to_goal</code>. Each of these components has their own objective, and can take precedence over or &#8220;subsume&#8221; another.</p><p>The second concept is (as the article title suggests) a complete rejection of internal models and representations. Brooks insists that the world is the model, and the only way to develop autonomous functions is by directly interacting with the environment.</p><p>The important takeaway for us is that there is no brain at all. The body (comprising sensors, motors) <em>is</em> the cognitive system. This concept resulted in autonomy in a variety of robots built by Brooks and colleagues:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1-Id!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d3cd1b-3a92-46d2-a6cd-7b929057539e_876x620.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1-Id!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d3cd1b-3a92-46d2-a6cd-7b929057539e_876x620.png 424w, https://substackcdn.com/image/fetch/$s_!1-Id!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d3cd1b-3a92-46d2-a6cd-7b929057539e_876x620.png 848w, https://substackcdn.com/image/fetch/$s_!1-Id!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d3cd1b-3a92-46d2-a6cd-7b929057539e_876x620.png 1272w, https://substackcdn.com/image/fetch/$s_!1-Id!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d3cd1b-3a92-46d2-a6cd-7b929057539e_876x620.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1-Id!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d3cd1b-3a92-46d2-a6cd-7b929057539e_876x620.png" width="513" height="363.0821917808219" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51d3cd1b-3a92-46d2-a6cd-7b929057539e_876x620.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:620,&quot;width&quot;:876,&quot;resizeWidth&quot;:513,&quot;bytes&quot;:418223,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/199171451?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d3cd1b-3a92-46d2-a6cd-7b929057539e_876x620.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1-Id!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d3cd1b-3a92-46d2-a6cd-7b929057539e_876x620.png 424w, https://substackcdn.com/image/fetch/$s_!1-Id!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d3cd1b-3a92-46d2-a6cd-7b929057539e_876x620.png 848w, https://substackcdn.com/image/fetch/$s_!1-Id!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d3cd1b-3a92-46d2-a6cd-7b929057539e_876x620.png 1272w, https://substackcdn.com/image/fetch/$s_!1-Id!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d3cd1b-3a92-46d2-a6cd-7b929057539e_876x620.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The other takeaway (which we will return to) is that algorithms can be distributed in parallel in different parts of the body. For examples, a small loop including only the distance sensors and a component of the motor commands would serve the obstacle avoidance function.</p><h3>Predictive Coding (1999)</h3><p>This <a href="https://www.nature.com/articles/nn0199_79">seminal work by Rao and Ballard</a> provides a <a href="https://open.substack.com/pub/wheremachinesthink/p/the-case-for-world-models-part-i?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">neuroscientific foundation for internal &#8220;world&#8221; models</a>. The initial 1999 paper focused on perception in the visual cortex, and is the largely accepted hypothesis for this process:</p><blockquote><p>Rao and Ballard implemented a simple model of the visual cortex, using a 3-layer hierarchical neural network (they numbered the layers 0, 1, and 2) with two-way connections: predictions flowed from the higher to lower layers and errors, or residuals, went from lower to higher layers.</p><p>&#8230;</p><p>Rao and Ballard found that their network spontaneously discovered hierarchical processing: layer 1 learned to recognize bars and edges, while layer 2 (the topmost layer) learned to compose features learned by layer 1 to recognize more abstract features.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jfHf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2c3c81f-43be-43cd-91bb-fe697ea63622_1833x2186.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jfHf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2c3c81f-43be-43cd-91bb-fe697ea63622_1833x2186.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jfHf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2c3c81f-43be-43cd-91bb-fe697ea63622_1833x2186.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jfHf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2c3c81f-43be-43cd-91bb-fe697ea63622_1833x2186.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jfHf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2c3c81f-43be-43cd-91bb-fe697ea63622_1833x2186.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jfHf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2c3c81f-43be-43cd-91bb-fe697ea63622_1833x2186.jpeg" width="335" height="399.4230769230769" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2c3c81f-43be-43cd-91bb-fe697ea63622_1833x2186.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1736,&quot;width&quot;:1456,&quot;resizeWidth&quot;:335,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jfHf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2c3c81f-43be-43cd-91bb-fe697ea63622_1833x2186.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jfHf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2c3c81f-43be-43cd-91bb-fe697ea63622_1833x2186.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jfHf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2c3c81f-43be-43cd-91bb-fe697ea63622_1833x2186.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jfHf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2c3c81f-43be-43cd-91bb-fe697ea63622_1833x2186.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Predictive coding for perceptual processing showing interaction between internal models and sensory stimulus (<a href="https://open.substack.com/pub/wheremachinesthink/p/the-case-for-world-models-part-i?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">Source</a>)</figcaption></figure></div><p>The first takeaway is that neither direction (eyes to brain, or brain to eyes) stands on its own. The prediction arrows justify <a href="https://www.avikde.me/p/the-ai-world-models-debate-and-its?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">world models that are currently under active research in AI and robotics</a>, and the sensory stimulus arrows similarly justify that perceptual models in the brain don&#8217;t work without the sensors themselves.</p><p>The second takeaway here is the central role of a &#8220;brain,&#8221; which is a departure from Gibson or Brooks. Still, the necessity for a brain does not reduce the significance of the body (and the brain&#8217;s coupling to it) in this view.</p><h3>Friston&#8217;s Free Energy (2006)</h3><p>Friston extended the predictive processing idea from perception to action, and a <a href="https://www.sciencedirect.com/science/article/abs/pii/S092842570600060X">broader principle for the working of the brain</a>. Per this theory, the brain optimizes a free-energy which approximately contains terms related to prediction error (similar to above) and also value (attaining a goal).</p><p>It is easy to illustrate with an example of an arm asked to point to a green dot:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P83j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3390e4e3-d63f-4d69-971f-45d3f672ccc5_780x524.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P83j!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3390e4e3-d63f-4d69-971f-45d3f672ccc5_780x524.jpeg 424w, https://substackcdn.com/image/fetch/$s_!P83j!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3390e4e3-d63f-4d69-971f-45d3f672ccc5_780x524.jpeg 848w, https://substackcdn.com/image/fetch/$s_!P83j!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3390e4e3-d63f-4d69-971f-45d3f672ccc5_780x524.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!P83j!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3390e4e3-d63f-4d69-971f-45d3f672ccc5_780x524.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P83j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3390e4e3-d63f-4d69-971f-45d3f672ccc5_780x524.jpeg" width="588" height="395.0153846153846" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3390e4e3-d63f-4d69-971f-45d3f672ccc5_780x524.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:524,&quot;width&quot;:780,&quot;resizeWidth&quot;:588,&quot;bytes&quot;:61905,&quot;alt&quot;:&quot;Figure 2: A demonstration of cued reaching movements.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 2: A demonstration of cued reaching movements." title="Figure 2: A demonstration of cued reaching movements." srcset="https://substackcdn.com/image/fetch/$s_!P83j!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3390e4e3-d63f-4d69-971f-45d3f672ccc5_780x524.jpeg 424w, https://substackcdn.com/image/fetch/$s_!P83j!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3390e4e3-d63f-4d69-971f-45d3f672ccc5_780x524.jpeg 848w, https://substackcdn.com/image/fetch/$s_!P83j!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3390e4e3-d63f-4d69-971f-45d3f672ccc5_780x524.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!P83j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3390e4e3-d63f-4d69-971f-45d3f672ccc5_780x524.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A perception-action loop that works in concert to minimize free energy (<a href="https://www.nature.com/articles/nrn2787">source</a>).</figcaption></figure></div><p>The steps in the loop are:</p><ol><li><p>Brain predicts (&#8220;dreams&#8221;) sensory state: <em>arm is at green circle</em></p></li><li><p>Sensor reports: <em>arm is at red circle</em> &#8594; prediction error</p></li><li><p>Action is triggered to <em>make the prediction true</em> &#8594; the arm moves</p></li><li><p>When the arm reaches green, sensor input matches prediction. The prediction error goes to zero, and free energy is minimized.</p></li></ol><p>The takeaway here is that free energy crucially depends on both the perception and action systems, and is meaningless without them. The brain&#8217;s functioning cannot be separated from the body.</p><h3>Mechanical Intelligence or Body Schema</h3><p>Commonly, we think of a single holistic brain interfacing with all the sensors and controlling all the motor functions in an animal. Brooks&#8217; subsumption architecture, and to some extent Gibson&#8217;s view, allowed for a more distributed notion of computation.</p><p>Computation can be embedded in parts of the body without requiring the brain to be involved. In the 2005 book <a href="https://ndpr.nd.edu/reviews/how-the-body-shapes-the-mind/">How the Body Shapes the Mind</a> Gallagher describes the concept of a &#8220;body schema&#8221; which operates without any conscious activity of the brain.</p><blockquote><p>My body schema is what arranges that my hand shape itself just so in order to pick up a pencil without my paying any attention to how it is shaped, it is what tightens my back muscles and adjusts my posture when I shake hands so that I do not throw myself off balance with the movement, and so on. It operates (to a first approximation) independently of what I think or how I feel.</p></blockquote><p>Reflexes or <a href="https://en.wikipedia.org/wiki/Preflexes">preflexes</a> are also examples of this kind of mechanical intelligence or passive dynamics. Computers don&#8217;t always need to be built with chips, sometimes tendons and muscles can implement PID control!</p><p>In the history of robotics, some of the most exciting milestones in robotic mobility made use of this kind of mechanical intelligence. This includes <a href="https://www.youtube.com/watch?v=WOPED7I5Lac">Tad McGeer&#8217;s completely unpowered walkers</a> (~1990),</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;b4239fa6-be01-4135-947d-7f88bf24f8d5&quot;,&quot;duration&quot;:null}"></div><p>the <a href="https://en.wikipedia.org/wiki/Rhex">first outdoor running robot, RHex</a> (~1999), the <a href="https://www.youtube.com/watch?v=_HhwLE5tw-M">IHMC OutRunner running robot (~2014)</a>,</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;768f083f-fc55-4830-b6c4-eb36aa109b53&quot;,&quot;duration&quot;:null}"></div><p>the <a href="https://www.youtube.com/watch?v=YFEJvb8iM7A">ATRIAS robot (~2015)</a> and its modern descendants at Agility Robotics, etc.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;3564fec0-711b-4bfe-b50d-2c5af0f14b6f&quot;,&quot;duration&quot;:null}"></div><p>There&#8217;s a lot more to say about mechanical intelligence (general purpose vs. task-specific design, power amplification, latching mechanisms, etc.) that I plan to cover in a dedicated future article &#8212; subscribe to make sure you don&#8217;t miss it!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><p>The takeaway for this article is that computation need not always go through a central brain. Hierarchically distributed mechanical intelligence can complement a brain, as it surely does for animals, reducing the burden of the nervous system:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Li9T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adb5e11-aa42-4653-a438-dba8588f700c_400x246.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Li9T!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adb5e11-aa42-4653-a438-dba8588f700c_400x246.gif 424w, https://substackcdn.com/image/fetch/$s_!Li9T!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adb5e11-aa42-4653-a438-dba8588f700c_400x246.gif 848w, https://substackcdn.com/image/fetch/$s_!Li9T!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adb5e11-aa42-4653-a438-dba8588f700c_400x246.gif 1272w, https://substackcdn.com/image/fetch/$s_!Li9T!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adb5e11-aa42-4653-a438-dba8588f700c_400x246.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Li9T!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adb5e11-aa42-4653-a438-dba8588f700c_400x246.gif" width="400" height="246" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3adb5e11-aa42-4653-a438-dba8588f700c_400x246.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:246,&quot;width&quot;:400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;View from below a tank in which a (dead) fish swims upstream behind an obstacle&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="View from below a tank in which a (dead) fish swims upstream behind an obstacle" title="View from below a tank in which a (dead) fish swims upstream behind an obstacle" srcset="https://substackcdn.com/image/fetch/$s_!Li9T!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adb5e11-aa42-4653-a438-dba8588f700c_400x246.gif 424w, https://substackcdn.com/image/fetch/$s_!Li9T!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adb5e11-aa42-4653-a438-dba8588f700c_400x246.gif 848w, https://substackcdn.com/image/fetch/$s_!Li9T!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adb5e11-aa42-4653-a438-dba8588f700c_400x246.gif 1272w, https://substackcdn.com/image/fetch/$s_!Li9T!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adb5e11-aa42-4653-a438-dba8588f700c_400x246.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">This (dead) fish appears to swim upstream due to its passive dynamic interactions with water (<a href="https://fyfluiddynamics.com/2018/07/when-i-was-a-child-my-father-would-take-me-trout/">source</a>)</figcaption></figure></div><p>Returning to the <a href="https://www.darpa.mil/news/2026/rethinking-robotics">DARPA Physical Intelligence RFI</a>, it is clear that this type of distributed intelligence is what they are looking for:</p><blockquote><p>Rather than relying on centralized processors and large data flows, DARPA is exploring materials that can perform computation directly.</p></blockquote><h2>Returning to the Present</h2><p>With the practically motivated goal of utilizing the best technology at our disposal to build the most capable robots, which of these (potentially conflicting) ideas should we bring along?</p><h3>Behavior Cloning and Multi-Robot Brains</h3><p>Behavior cloning (imitating a human performing a task with a robot) is the de facto methodology for modern humanoid robotics. Data of humans doing a huge variety of tasks is <a href="https://itcanthink.substack.com/p/how-can-we-get-enough-data-to-train">plentiful and readily available</a>.</p><p>This paradigm runs into some challenges in light of the embodiment theories we just discussed:</p><ol><li><p>Mechanical intelligence is invisible in the dataset. If the body is performing stabilization effectively, the data only contains the stable configuration without any indication of the act of stabilization itself. This is one of the reasons VLA actions typically have some stability structure wrapped around them (<a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">like an impedance controller</a>). It isn&#8217;t possible to directly learn a complex dynamical behavior like gymnastics purely from imitation (the next section discusses how this is <em>actually</em> done).</p></li><li><p>When training data is from even a very slightly different body, the Gibsonian view is that the affordances aren&#8217;t compatible. In fact, we do see that small-scale <a href="https://www.avikde.me/p/debugging-as-architecture-insight">VLAs are actually quite poor at cross-embodiment generalization</a>. Scale and cross-embodiment training show promising results, but have an efficiency cost &#8212; for a fuller coverage of this topic, check out my previous <a href="https://www.avikde.me/p/a-multi-robot-brain-is-not-like-a?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">article on multi-robot brains</a>.</p></li><li><p>Insects and elephants both walk, but they use very different types of actuators and sensors in their embodiment to do so. High-level task strategies may heavily rely on the differences in how these components work. Here is a very relevant passage from Gallagher:</p></li></ol><blockquote><p>At the age of nineteen, Waterman lost all sense of touch and proprioception below his neck. As a result, he instantly lost the ability to control the affected parts of his body. Slowly, he regained the ability to walk, dress, eat, and so on, but in order to do these things he had to learn to do them in a new way: by alert conscious control of his every movement. Waterman must consciously adjust his balance when turning a corner, think about swinging his leg to take a step, make an effort to shape his hand into a position suited to gripping a mug if he wants to pick it up, and so on. As a result of this, he remains substantially disabled in his behavior. His case demonstrates that, while one&#8217;s body schema is not strictly necessary for movement, it is necessary for movement in normal human beings and necessary for fluent movement even after extensive retraining.</p><p>Waterman is, as Gallagher puts it, a man with a body image but without a body schema.</p></blockquote><p>A robot certainly has a different body schema from the human demonstrating a skill, and skill transfer should expect some of the same issues.</p><p>The takeaway here is that the architecture of embodied intelligence is not just a brain in a vat puppeting a body, and includes distributed, hierarchical computing.</p><h3>A Potential Resolution</h3><p>Despite the incompatibilities with embodiment theories, behavior cloning across embodiments does seem to empirically work, as evidenced by the <a href="https://www.avikde.me/p/a-multi-robot-brain-is-not-like-a">growing number of robotic foundation model demonstrations</a>. In addition, we see highly dynamic motion transferred to robots:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;22cf2c5c-9ccd-4b90-81e4-4dc4a1406df1&quot;,&quot;duration&quot;:null}"></div><p>That last video&#8217;s most crucial component is something we haven&#8217;t discussed yet: <em>reinforcement learning (RL)</em>. In this step, the robot is placed in a simulation with its embodiment in its environment &#8212; <em>all the component&#8217;s of Gibson&#8217;s affordance structure</em> &#8212; and set loose to optimize its behavior over thousands or millions of trials.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><p>This combination is powerful.</p><p>The AI brain (trained without preference for a particular embodiment) provides cognitive higher-level intelligence, a menu of task-solving strategies, and examples of basic perception and motor skills. Think of it as an instruction manual, or a compendium of videos of someone playing a sport that you want to learn, say tennis.</p><p>However, if you want to actually play tennis, there is <em>no alternative</em> to actually using your eyes to track the ball, and using your wrists to flick the racket for a forehand. The very term &#8220;muscle memory&#8221; indicates something that cannot be transferred from demonstration. However, RL (or its model-based alternative, trajectory optimization) provides a mechanism for robots to gain such practice, with its body, in the requisite environment.</p><p>A fruitful recipe for training a robot to play tennis could include a very high-level instruction manual in the form of human demonstration, followed by extensive RL with the actual sensors, potentially including sensory modalities not even available in the demonstrations.</p><p>This also makes room for incorporating innovation in meta-materials as in the DARPA RFI &#8212; those modalities can be introduced at the RL stage, enabling rich distributed, hierarchical, and feedback-enhanced architectures.</p><p>The RFI itself is a bit pessimistic about be-all-end-all human-like form factors:</p><blockquote><p>Additionally, while industry has emphasized human-like form factors designed to operate in human environments, of interest here are systems optimized for mission needs. Depending on the application, this could include designs that are smaller, larger, softer, or structurally unconventional, prioritizing performance and adaptability over familiarity.</p></blockquote><p>However, there&#8217;s a lot of potential in hierarchical integration of AI brains with smart materials that can perform sensing, actuation, and computation. They may come together in a humanoid form or not (we discussed <a href="https://www.avikde.me/p/the-first-paradigm-in-robotics-and">form factor diversification in robotics in this post</a>). It just needs a bit of care in the methods used!</p><p>Thanks for reading! <em>If you liked this post, please like (&#10084;&#65039; button), restack, and subscribe &#8212; it helps others find my writing.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/the-loops-and-hierarchies-of-embodied?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/the-loops-and-hierarchies-of-embodied?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h2>Further Reading</h2><p>In related past articles, I&#8217;ve written about the architecture of end-to-end robotics pipelines, and architectural strengths and limitations of deep neural networks. If you liked this post, these would be great reads:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;8dfc1ea6-0f69-4292-b797-8864610f907b&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The architecture behind &#8220;end-to-end&#8221; robotics pipelines&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-26T21:19:56.368Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!766O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/the-architecture-behind-end-to-end&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:185869291,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:26,&quot;comment_count&quot;:15,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;97f84b2e-b499-48bb-9a1a-f33459e98f6e&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How an LLM Changes its Mind&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-05T12:14:39.484Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ghz-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/how-an-llm-changes-its-mind&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:196138155,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:2,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;938d550b-fecf-4433-9f00-85868dd32a79&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;A Multi-Robot Brain is not like a Multi-Chip ISA&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-15T16:42:00.865Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!sjJB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e2aeaa-1e51-4278-befe-ae49196ae1a2_1670x874.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/a-multi-robot-brain-is-not-like-a&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:197402306,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>For this article, I returned to these great Substack posts by other authors. Check them out too:</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:185737949,&quot;url&quot;:&quot;https://wheremachinesthink.substack.com/p/the-case-for-world-models-part-i&quot;,&quot;publication_id&quot;:5277805,&quot;publication_name&quot;:&quot;WHERE MACHINES THINK&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Yem8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6531b6c-86e3-4240-8372-b5a887412b64_608x608.png&quot;,&quot;title&quot;:&quot;The Case For World Models, Part I: The Neuroscientific Reason&quot;,&quot;truncated_body_text&quot;:&quot;LOOK at the two images above. What do you see?&quot;,&quot;date&quot;:&quot;2026-02-09T06:39:31.033Z&quot;,&quot;like_count&quot;:55,&quot;comment_count&quot;:9,&quot;bylines&quot;:[{&quot;id&quot;:328415354,&quot;name&quot;:&quot;Anil Ananthaswamy&quot;,&quot;handle&quot;:&quot;anilananth&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf1b6a95-42d9-4ec4-ac36-43daab10f105_3024x3024.jpeg&quot;,&quot;bio&quot;:&quot;Ex-Software Eng. / Author / Former Dep. News Editor, New Scientist. Bylines in NS, Nature, SciAm, Quanta &amp; more. Books: The Edge of Physics, The Man Who Wasn't There, Through Two Doors at Once and Why Machines Learn. Prof of Practice, IIT-Madras&quot;,&quot;profile_set_up_at&quot;:&quot;2025-06-09T01:26:52.231Z&quot;,&quot;reader_installed_at&quot;:&quot;2026-01-22T08:27:17.455Z&quot;,&quot;publicationUsers&quot;:[],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null,&quot;status&quot;:{&quot;bestsellerTier&quot;:null,&quot;subscriberTier&quot;:null,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:null,&quot;paidPublicationIds&quot;:[],&quot;subscriber&quot;:null}}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;,&quot;source&quot;:null}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://wheremachinesthink.substack.com/p/the-case-for-world-models-part-i?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!Yem8!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6531b6c-86e3-4240-8372-b5a887412b64_608x608.png" loading="lazy"><span class="embedded-post-publication-name">WHERE MACHINES THINK</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">The Case For World Models, Part I: The Neuroscientific Reason</div></div><div class="embedded-post-body">LOOK at the two images above. What do you see&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">4 months ago &#183; 55 likes &#183; 9 comments &#183; Anil Ananthaswamy</div></a></div><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:165439695,&quot;url&quot;:&quot;https://itcanthink.substack.com/p/how-can-we-get-enough-data-to-train&quot;,&quot;publication_id&quot;:2883266,&quot;publication_name&quot;:&quot;It Can Think!&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!13Dp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a886fd-347d-4694-b670-0253975d2ba9_659x547.png&quot;,&quot;title&quot;:&quot;How can we get enough data to train a robot GPT?&quot;,&quot;truncated_body_text&quot;:&quot;It&#8217;s no secret that large language models are trained on massive amounts of data - many trillions of tokens. Even the largest robot datasets are quite far from this; in a year, Physical Intelligence collected about 10,000 hours worth of robot data to train their first foundation model, PI0. Professor Ken Goldberg of UC Berkeley gave a talk which Andra K&#8230;&quot;,&quot;date&quot;:&quot;2025-06-10T13:03:10.049Z&quot;,&quot;like_count&quot;:58,&quot;comment_count&quot;:5,&quot;bylines&quot;:[{&quot;id&quot;:232680664,&quot;name&quot;:&quot;Chris Paxton&quot;,&quot;handle&quot;:&quot;cpaxton&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!13Dp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a886fd-347d-4694-b670-0253975d2ba9_659x547.png&quot;,&quot;bio&quot;:&quot;Roboticist and AI researcher&quot;,&quot;profile_set_up_at&quot;:&quot;2024-06-07T00:31:03.267Z&quot;,&quot;reader_installed_at&quot;:&quot;2024-06-07T02:24:18.193Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:2930910,&quot;user_id&quot;:232680664,&quot;publication_id&quot;:2883266,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:true,&quot;publication&quot;:{&quot;id&quot;:2883266,&quot;name&quot;:&quot;It Can Think!&quot;,&quot;subdomain&quot;:&quot;itcanthink&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Robotics and AI; the future we're building and how we'll get there&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5a886fd-347d-4694-b670-0253975d2ba9_659x547.png&quot;,&quot;author_id&quot;:232680664,&quot;primary_user_id&quot;:232680664,&quot;theme_var_background_pop&quot;:&quot;#FF6719&quot;,&quot;created_at&quot;:&quot;2024-08-13T16:27:27.738Z&quot;,&quot;email_from_name&quot;:&quot;Chris Paxton from \&quot;It Can Think\&quot;&quot;,&quot;copyright&quot;:&quot;Chris Paxton&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;magaziney&quot;,&quot;is_personal_mode&quot;:false,&quot;logo_url_wide&quot;:null}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null,&quot;status&quot;:{&quot;bestsellerTier&quot;:null,&quot;subscriberTier&quot;:null,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:null,&quot;paidPublicationIds&quot;:[],&quot;subscriber&quot;:null}}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;,&quot;source&quot;:null}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://itcanthink.substack.com/p/how-can-we-get-enough-data-to-train?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!13Dp!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a886fd-347d-4694-b670-0253975d2ba9_659x547.png" loading="lazy"><span class="embedded-post-publication-name">It Can Think!</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">How can we get enough data to train a robot GPT?</div></div><div class="embedded-post-body">It&#8217;s no secret that large language models are trained on massive amounts of data - many trillions of tokens. Even the largest robot datasets are quite far from this; in a year, Physical Intelligence collected about 10,000 hours worth of robot data to train their first foundation model, PI0. Professor Ken Goldberg of UC Berkeley gave a talk which Andra K&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">a year ago &#183; 58 likes &#183; 5 comments &#183; Chris Paxton</div></a></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Astute readers will observe that domain randomization (a crucial component of RL in simulation) interferes with the affordance structure. In reality, the art is to randomize enough that the policy is robust, but not enough to span different strategies being required. As nicely stated by <a href="https://arxiv.org/pdf/2110.03239">this ICLR 2022 paper</a>, &#8220;With sufficient data sampled using the simulator, the agent can find a near-optimal policy w.r.t. the average value function over a variety of simulation environments.&#8221; Over-randomizing can lead to the average value function not being optimal for the actual instantiation.</p></div></div>]]></content:encoded></item><item><title><![CDATA[A Multi-Robot Brain is not like a Multi-Chip ISA]]></title><description><![CDATA["Cross-embodiment" trained policies generalize well, but is that the best solution?]]></description><link>https://www.avikde.me/p/a-multi-robot-brain-is-not-like-a</link><guid isPermaLink="false">https://www.avikde.me/p/a-multi-robot-brain-is-not-like-a</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Fri, 15 May 2026 16:42:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!sjJB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e2aeaa-1e51-4278-befe-ae49196ae1a2_1670x874.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We have recently seen the emergence of a number of &#8220;multi-robot brains,&#8221; or AI models that are meant to output motor commands for a variety of robot bodies. Some examples are Skild AI&#8217;s omni-bodied brain, which they <a href="https://www.skild.ai/blogs/omni-bodied">argue for here</a>, and Physical Intelligence&#8217;s <a href="https://www.pi.website/blog/pi0">pi0 and later models</a>. The capabilities of these policies are impressive, showing signs of generalization and fault tolerance that have not been demonstrated before.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sjJB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e2aeaa-1e51-4278-befe-ae49196ae1a2_1670x874.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sjJB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e2aeaa-1e51-4278-befe-ae49196ae1a2_1670x874.png 424w, https://substackcdn.com/image/fetch/$s_!sjJB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e2aeaa-1e51-4278-befe-ae49196ae1a2_1670x874.png 848w, https://substackcdn.com/image/fetch/$s_!sjJB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e2aeaa-1e51-4278-befe-ae49196ae1a2_1670x874.png 1272w, https://substackcdn.com/image/fetch/$s_!sjJB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e2aeaa-1e51-4278-befe-ae49196ae1a2_1670x874.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sjJB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e2aeaa-1e51-4278-befe-ae49196ae1a2_1670x874.png" width="1670" height="874" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11e2aeaa-1e51-4278-befe-ae49196ae1a2_1670x874.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:874,&quot;width&quot;:1670,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:560476,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/197402306?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9401f11d-548d-401f-9b6b-fbee5da5dc3b_1670x891.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sjJB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e2aeaa-1e51-4278-befe-ae49196ae1a2_1670x874.png 424w, https://substackcdn.com/image/fetch/$s_!sjJB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e2aeaa-1e51-4278-befe-ae49196ae1a2_1670x874.png 848w, https://substackcdn.com/image/fetch/$s_!sjJB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e2aeaa-1e51-4278-befe-ae49196ae1a2_1670x874.png 1272w, https://substackcdn.com/image/fetch/$s_!sjJB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e2aeaa-1e51-4278-befe-ae49196ae1a2_1670x874.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>It feels natural to try and compare this kind of brain-body interface to a software-hardware interface in computers. An Instruction Set Architecture (ISA) helps abstract software programming from processor hardware implementation. The first ever ISA, in the <a href="https://www.ibm.com/history/system-360">IBM System/360 (1964)</a>, had the explicit goal of making software compatible across different hardware generations, and this goal has remained the most important driving force behind their existence.</p><p>In this article, I wanted to see how this analogy could help us think about multi-robot brains.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p><h2>The ISA Analogy</h2><p>The benefits of an ISA become clear when examining a typical &#8220;stack&#8221; for a computer:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/EEDVp/2/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8cce941-a7ae-4ebb-800f-67e7f15b3918_1220x468.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6d91cb3a-a910-4da4-a775-6cb1ed426195_1220x538.png&quot;,&quot;height&quot;:259,&quot;title&quot;:&quot;Computer stack&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/EEDVp/2/" width="730" height="259" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>The columns of this chart are different &#8220;verticals&#8221;. If a company owns every item in the column, that would be &#8220;vertical integration&#8221;. The iPhone, and most of Apple&#8217;s products, are famously vertically integrated up to the application layer. The PC ecosystem does not exhibit the same kind of vertical integration.</p><p>Conversely, &#8220;horizontal integration&#8221; is when a particular product or vendor is appears in many columns across a row of the stack. For example, the Android OS is used in 70% of smartphones sold worldwide.</p><p>The ISA allows the stack to grow horizontally without recreating every piece of the stack. The marginal effort required to introduce new software, or a new chip, is reduced as long as it can bridge itself to the ISA.</p><p>Let&#8217;s compare to a robot stack<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> for a couple of hypothetical robots:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/2xoDV/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ec5a30bd-42d3-46d5-a300-f351a81d7426_1220x620.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2fa747db-4e3b-4d55-b959-ca3ca81e33fe_1220x620.png&quot;,&quot;height&quot;:298,&quot;title&quot;:&quot;[ Insert title here ]&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/2xoDV/1/" width="730" height="298" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>Vertical integration is <a href="https://www.robonaissance.com/p/inside-chinas-machine-unitree">exemplified by Unitree</a>, and it could be argued that NVIDIA is positioning itself very well to be horizontally integrated as the computing device of choice.</p><p>The multi-robot brain (a) vertically bridges hardware and software, and (b) horizontally bridges different robots. In contrast, while the ISA bridges vertically, it does <em>not</em> bridge horizontally &#8212; the iPhone and the Android phone need not share any software or hardware components. This is an important distinction, because the horizontal bridging <em>hinders vertical integration</em>.</p><h2>Vertical Integration</h2><p>An ISA promotes vertical bridging, but simply gets out of the way after that, allowing for vertical integration. This allows for products that are really optimized to work well in a way that isn&#8217;t possible with horizontally integrated components. An easy example of this is how well sleep and wake work on a MacBook due to the tight vertical integration, and how poor it still is in Windows laptops.</p><p>The Arm AGI CPU was introduced recently because when performance is critical, companies were still building bespoke software to take advantage of vertically-integrated optimizations. Arm saw this need and it was large enough to get them to break out of a <a href="https://thechipletter.substack.com/p/arm-makes-chips">decades-old strategic choice and build their own chip</a> (emphasis mine):</p><blockquote><p>&#8220;Delivering AI experiences at global scale demands a robust and adaptable portfolio of custom silicon solutions, <strong>purpose-built to accelerate AI workloads and optimize performance</strong> across Meta&#8217;s platforms,&#8221; said Santosh Janardhan, head of infrastructure, Meta.</p></blockquote><p>For critical datacenter functionality, the last mile of performance optimization needed vertical integration. Everyone else, like Amazon, Google, was already doing it by customizing their stacks, and Arm decided to try and save its customers from having to do this work by doing it themselves.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><p></p><h2>The Missing Analogue of Compilation</h2><p>In computer-land, the ISA is paired with a compiler that builds a binary tailored to the target system. The compiler can optimize and produce a binary that runs well on a particular chip using vector instructions, reordering, and other optimizations within the constraints of program correctness.</p><p>There isn&#8217;t an equivalent of compilation in a multi-robot deep neural network brain (though maybe there are <a href="https://www.avikde.me/p/how-an-llm-changes-its-mind">Turing-complete architectural alternatives</a> that could emerge in the future). This results in a few cracks in the ISA analogy:</p><h3>1) Memory / Performance</h3><p>The current multi-robot brain is more like a <a href="https://en.wikipedia.org/wiki/Fat_binary">fat binary</a> in some ways. While the exact implementation varies, there will usually be a vector of latent variables that explicitly or implicitly encodes the details of the embodiment. So, while a Mac universal binary could use a single bit to pick x86 or Arm machine code, the robot embodiment is represented by an <em>n</em>-dimensional vector of latent state that can further weight or guide the network.</p><p>In the Skild AI blog post referenced above, this &#8220;fat binary&#8221; aspect is used to demonstrate recovery from limb loss, or other aspects of hardware failure. The tradeoff is the added network size, manifesting as increased memory and compute burden.</p><h3>2) Hardware-Specific Optimization</h3><p>A multi-robot brain needs to adapt for factors such as location and dimensions of limbs, cameras, as well as higher-level locomotion and task strategy. For the kinematics and sensor transforms, optimized code is akin to platform-specific subroutines, an analogy I made in my <a href="https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics">article series on VLAs</a>.</p><p>A robotic behavior &#8220;compilation&#8221; step could likewise tailor task execution strategy to a particular body. For example, <a href="https://underactuated.mit.edu/trajopt.html">trajectory optimization</a> or reinforcement learning (RL) can be used to optimize a behavior for a robot body, and like compilation, is typically done offline.</p><p>However, in a <em>neural network </em>multi-robot brain (which does not have this notion of compilation) designed to solve a task with <em>B</em> bodies, either the behavior is not optimized for each body type, or <em>B</em> different strategies need to be stored (bringing back the memory issue from the prior point). In reality, the multi-robot brain does store some number of distinct-looking strategies (see the Skild AI blog post for demonstrations).</p><p>However, by the nature of deep neural networks, a discontinuous divergence in strategies is <a href="https://www.avikde.me/p/how-an-llm-changes-its-mind">not very efficiently representable</a>. As an example of strategy discontinuity, consider that my cat needs to run or jump up human-sized stairs, whereas a larger animal could simply walk up them. The multi-robot brain needs to do a lot of work to represent effective locomotion strategies for these variations.</p><h2>Common Sense Generalization</h2><p>One of the main arguments in favor of cross-embodiment training is that people are finding that it really helps with generalization. For example, see this <a href="https://youtu.be/n-pLDaZDO9k?si=_BNGkK72CyQF6KMG&amp;t=1343">excerpt of Sergey Levine&#8217;s explanation</a> on the Automated podcast two days ago, and Skild&#8217;s case in <a href="https://www.skild.ai/blogs/omni-bodied">their blog post</a>:</p><blockquote><p>One way to do this is to train the AI to control not just one robot, but a whole multiverse of robots with different bodies. It cannot memorize the solution for one body, it must find a strategy that works across all of them. When faced with unpredictable scenarios, the AI can now use the strategies it learnt during training and keep going.</p></blockquote><p>There is still ongoing research and development on this finding, but one implicit logical leap is that the strategy is being evaluated (in all these cases) with end-to-end networks. This means that the cross-embodiment training is for a deep neural network that is learning not just the particulars of one robot body, but also the functional form of the necessary mathematical transformations to control it.</p><p>As an analogy, consider training a network to perform the sine calculation,</p><p><em>y = f(x) := sin(x)</em></p><p>We don&#8217;t know what the &#8220;<em>sin()</em>&#8221; operation does, but we are learning it from inputs and outputs <em>{x, y}</em>. To train this network, we have the option of using data from one embodiment, which has <em>x</em> values in the range <em>[0, 1]</em>, or data from multiple embodiments with ranges covering the range <em>[-10, 10]</em>. The extra data from the other embodiments are helpful to just understand how the sine function works.</p><h2>Closing Thoughts</h2><p>Cross-embodiment robot training creates an interesting software stack bridging software to hardware vertically, as well as horizontally across different tasks and robot bodies. The former aspect is evocative of the ISA abstraction layer in computing, but the latter aspect is more of a distinction.</p><p>I wonder if a &#8220;compilation&#8221; analogue could exist, allowing for optimization of a multi-robot policy to a slimmed-down robot-specific policy. It&#8217;s possible to use RL to post-train or to fine-tune a model such as the ones discussed in this article, but that interferes with the fault tolerance feature, and does not help reduce the model&#8217;s size.</p><p>Lastly, the size of these multi-robot policies is going to necessarily be large. Taking it to the scaling extreme, only the largest corporations may have the funding to train these models, <a href="https://www.avikde.me/p/the-first-paradigm-in-robotics-and">resulting in potentially unwanted ecosystem consolidation</a>.</p><p>To sum up, multi-robot brains have been showing impressive generalization ability, and I expect we will continue to see cutting-edge results from them. However, their generalization success is partially wrapped up with the end-to-end deep neural network architecture, and that there might be opportunities for significant optimization with architectural innovation.</p><p>Thanks for reading, and let me know your thoughts on this parallel!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/a-multi-robot-brain-is-not-like-a/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/a-multi-robot-brain-is-not-like-a/comments"><span>Leave a comment</span></a></p><p></p><p><em>If you enjoyed this post, please like (&#10084;&#65039;) and restack &#8212; it helps others find my writing. Subscribe to receive new posts. All of this is greatly appreciated.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>The stack could have been written differently to include sensors etc., but those details don&#8217;t affect the point of this article.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[How an LLM Changes its Mind]]></title><description><![CDATA[Safety and efficiency with universal approximators and Turing machines]]></description><link>https://www.avikde.me/p/how-an-llm-changes-its-mind</link><guid isPermaLink="false">https://www.avikde.me/p/how-an-llm-changes-its-mind</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Tue, 05 May 2026 12:14:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ghz-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Deep neural networks are unlocking solutions to new classes of problems seemingly on a monthly or weekly basis. The capabilities of LLMs, coding assistants, and agents are very impressive, but it&#8217;s also easy to get a bit carried away about what they are actually doing when they are provocatively referred to as artificial intelligence. They are still algorithms, and it&#8217;s good to take a step back to look at the type of algorithm they actually are.</p><p>Fortunately, we know a lot about what deep neural networks represent. As a starting point, the <strong><a href="https://en.wikipedia.org/wiki/Universal_approximation_theorem">universal approximation theorem</a> (UAT)</strong> says that a <strong>feed-forward neural network</strong> with at least one hidden layer can <strong>approximate any continuous function over a compact domain</strong> to any desired degree of accuracy, provided it has enough neurons and a non-linear activation function.</p><p>This begs a number of follow-up questions:</p><ul><li><p>What kinds of tasks are (not) solved by approximating a continuous function?</p></li><li><p>For this purpose, are transformers equivalent to feedforward neural networks, or do they do something different?</p></li><li><p>How do these map to computational hardware, like CPUs, GPUs, or NPUs?</p></li></ul><p>Answering these questions requires a review of what &#8220;computation&#8221; means,  looking all the way back to the writings of Turing, Minsky, and Chomsky. In exchange we get some insights into the versatility as well as the energetic cost of current AI.</p><p>I&#8217;ll provide some answers to the first two questions in this post, and a detailed look at the last one in a follow-up.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h2>Universal Approximation</h2><p>The prototypical &#8220;feedforward neural network&#8221; from the UAT is a multi-layer perceptron (MLP). This is typically composed of linear layers (which multiply its inputs by a weighting matrix) and a nonlinear activation function.</p><p>In the plots below<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, we&#8217;re approximating a quasi-sinusoidal curve on the left and a square wave on the right using an MLP.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ghz-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ghz-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 424w, https://substackcdn.com/image/fetch/$s_!ghz-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 848w, https://substackcdn.com/image/fetch/$s_!ghz-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 1272w, https://substackcdn.com/image/fetch/$s_!ghz-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ghz-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png" width="516" height="276.0524781341108" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70167b78-b951-4a03-8815-19710a04b7d0_686x367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:367,&quot;width&quot;:686,&quot;resizeWidth&quot;:516,&quot;bytes&quot;:61200,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/196138155?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ghz-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 424w, https://substackcdn.com/image/fetch/$s_!ghz-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 848w, https://substackcdn.com/image/fetch/$s_!ghz-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 1272w, https://substackcdn.com/image/fetch/$s_!ghz-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>With larger model width and depth:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RSJR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RSJR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 424w, https://substackcdn.com/image/fetch/$s_!RSJR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 848w, https://substackcdn.com/image/fetch/$s_!RSJR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 1272w, https://substackcdn.com/image/fetch/$s_!RSJR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RSJR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png" width="519" height="276.3021582733813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:370,&quot;width&quot;:695,&quot;resizeWidth&quot;:519,&quot;bytes&quot;:55233,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/196138155?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!RSJR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 424w, https://substackcdn.com/image/fetch/$s_!RSJR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 848w, https://substackcdn.com/image/fetch/$s_!RSJR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 1272w, https://substackcdn.com/image/fetch/$s_!RSJR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You&#8217;ll notice that the square wave is much more difficult to approximate than the sinusoidal one. Why is that? If you recall from above, the UAT promised that the MLP would be good at approximating <strong>continuous functions</strong>, and the square wave has periodic discontinuities.</p><p>Before you think that this is some pedantic example that would never occur in practice, let me offer two more practical ones that are equivalent.</p><p>Suppose you have a drone flying through a forest of tall trees:</p><div id="youtube2-m89bNn6RFoQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;m89bNn6RFoQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/m89bNn6RFoQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>The task is obstacle avoidance: the input is the front camera view, and the output we&#8217;d like is a path that won&#8217;t collide with a tree. In such a view, if the view changes <em>continuously</em> in such a way that a path becomes too narrow to pass through, the safe path must jump to a different one <em>discontinuously</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B9iV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B9iV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 424w, https://substackcdn.com/image/fetch/$s_!B9iV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 848w, https://substackcdn.com/image/fetch/$s_!B9iV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 1272w, https://substackcdn.com/image/fetch/$s_!B9iV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B9iV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png" width="491" height="201" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:201,&quot;width&quot;:491,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18687,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/196138155?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B9iV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 424w, https://substackcdn.com/image/fetch/$s_!B9iV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 848w, https://substackcdn.com/image/fetch/$s_!B9iV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 1272w, https://substackcdn.com/image/fetch/$s_!B9iV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>An MLP would need to sample the input space very densely to quickly interpolate between the left path and the right one (as in the square wave example above). This has a high model size penalty, and additionally needs to interpolate through an unsafe part of the output space.</p><p>Another practical example is related to the title of the article. Assuming an LLM&#8217;s output is a token view of an internal reasoning state, &#8220;changing its mind&#8221; on a yes / no question requires a similar jump in its state. However, the internal computing machinery of a modern LLM, the transformer, is more complex than an MLP. We&#8217;ll look into the two categories separately below.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><p></p><h2>Lookup Tables to Turing Machines</h2><p>The computation power guaranteed by the UAT is equivalent to a lookup table. A lookup table effectively pairs inputs and outputs so that it can &#8220;look up&#8221; the appropriate output when queried with an input. In continuous spaces, this can include some interpolation or extrapolation. The curve approximation figure above is a good visualization of this: the table would contain {x, y} entries. The compact domain condition of the UAT effectively ensures that the number of entries in the lookup table is finite.</p><p>On the other end of complexity, we have a <a href="https://en.wikipedia.org/wiki/Turing_machine">Turing machine</a>: an automaton that has access to unbounded memory, and is able to make discrete decisions based on what is in its memory. While this may sound foreign, it is actually a very familiar concept. A CPU paired with almost any programming language is a Turing machine (putting aside the implementation detail of potentially running out of memory). You can control a program&#8217;s flow using <code>if</code>, <code>while</code>, etc. and call subroutines, and with these building blocks, you can build any software that has ever been written.</p><p>It should be clear that a Turing machine can do fundamentally more than a lookup table:</p><ol><li><p>It can process an input that is arbitrarily large, which a lookup table cannot do. For example, you can <a href="https://en.wikipedia.org/wiki/Integer_factorization">very easily write</a> a CPU program that factorizes an integer, but we could never fit such an algorithm on a lookup table, since you could always input a larger integer. A more current example is a pre-transformer language model, which could not handle sequences of arbitrary length, and thus could not exhibit the level of capability we got with a GPT.</p></li><li><p>It can exhibit irregular flow control, like branching and jumping. In the &#8220;flying through forest&#8221; example above, it can do something like</p></li></ol><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">if left_path_too_narrow:
    take_right_path()
else:
    take_left_path()</code></pre></div><p>While this looks benign, it is deeply connected to the continuity clause of the UAT. An MLP cannot represent an algorithm that needs this kind of branching to have a discontinuous or symbolic jump.</p><p>In the example above, a square wave was still able to be approximated by an MLP, but at the expense of a large number of parameters. As a contrast, here&#8217;s an almost trivial program that could accomplish the requisite classification with very few parameters:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;f5665b4d-538b-4906-806e-4237d44f3842&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">if x mod 2 &lt; 1: # if the remainder of x/2 is &lt; 1
    return 1
else:
    return -1</code></pre></div><p>This shows the expressive power of a Turing machine compared to a lookup table. Adding a little structural or organizational complexity drastically reduced the number of required parameters.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h2>The Transformer Attention Mechanism</h2><p>We discussed earlier how the UAT only addresses a finite set of inputs. This is true in practice for MLPs as well: it will typically be used to process a fixed image size, or in an transformer feedforward network, a fixed layer width.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><p>The attention mechanism of transformers is different. In an LLM, when a sequence of tokens is fed in, each token can attend to each other token, enabling a computation paradigm that can handle sequences of arbitrary length. This makes it different from a lookup table, because the input <em>dimension itself is unbounded.</em> You don&#8217;t need to retrain for longer sequences since the attention mechanism adapts the algorithm.</p><p>In practical terms, a transformer&#8217;s sequence length has to be limited to a maximum context length to manage the mapping to computational hardware. By the same token, CPUs also needed unbounded memory to be true Turing machines.</p><p>So, are implementable transformers, like general purpose CPU programs, Turing machines in all but the most pedantic terms?</p><p>Not quite &#8212; there&#8217;s still a fundamental gap that cannot be closed. Transformers are still continuous function approximators and cannot efficiently exhibit irregular flow control. A <a href="https://arxiv.org/pdf/2602.11175">2026 paper from Oracle AI</a> looks at discrete reasoning with transformers, and I&#8217;ll let it speak for itself:</p><blockquote><p>Through this synthesis, we provide readers with a cohesive understanding of why transformers succeed in interpolation tasks (e.g. summarization) but fall short in reliably executing symbolic algorithms.</p></blockquote><p>Symbolic algorithms are characterized by discontinuous outputs that present a challenge to transformers. Like in the square wave example above, you can try to circumvent the issue by increasing model width or dataset size, but this comes at the cost of greatly increased model size and inefficiency. Moreover, as the paper points out, as you compose symbolic tasks (task A &#8594; task B &#8594; &#8230;) the number of switching boundaries grows combinatorially.</p><p>For an LLM to change its mind on a yes / no answer, architecturally it needs to continuously interpolate through reasoning trajectories, traversed by generating (lots of) reasoning tokens.</p><h2>Closing Thoughts</h2><p>Deep neural networks can solve a huge variety of problems, founded on their universal function approximation ability. Transformers&#8217; ability to process arbitrary sequences advances them into a new computational category beyond lookup tables.</p><p>However, they are still not well suited to problems with symbolic or discontinuous outputs. This is common in problems to do with safety or symbolic reasoning. In current successes of deep learning, solutions to these kinds of problems are attained in a similar fashion as the square wave approximation above &#8212; it works, but is extremely inefficient.</p><p>These problems could potentially be solved with much smaller models if they had Turing machine-style universal computation capabilities. <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Devansh&quot;,&quot;id&quot;:8101724,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48081c70-8afa-41e3-a44e-b0f917bc7577_1200x1600.jpeg&quot;,&quot;uuid&quot;:&quot;d12070ea-e64f-4be9-9406-3b5a437c91d8&quot;}" data-component-name="MentionToDOM"></span>&#8217;s article linked below advocates for the same thing, approaching it from the computational hardware perspective for some classes of problems. In a follow up post, I&#8217;ll tie together the first-principles analysis in this post to current computational hardware, to discuss how different algorithm classes effectively map.</p><p>Thanks for reading!</p><p><em>If you enjoyed this post, please like (&#10084;&#65039;) and restack &#8212; it helps others find my writing. Subscribe to receive new posts. All of this is greatly appreciated.</em></p><div data-component-name="FragmentNodeToDOM"><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/how-an-llm-changes-its-mind/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/how-an-llm-changes-its-mind/comments"><span>Leave a comment</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p></div><h2>References and Further Reading</h2><p><a href="https://lifeiscomputation.com/transformers-are-not-turing-complete/">Are Transformers Turing-complete?</a> &#8212; Hessam Akhlaghpour (2024)</p><p><a href="https://arxiv.org/pdf/2602.11175">Barriers to Discrete Reasoning with Transformers</a> &#8212; Oracle AI (2026)</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:166288637,&quot;url&quot;:&quot;https://www.artificialintelligencemadesimple.com/p/the-great-compute-re-architecture&quot;,&quot;publication_id&quot;:1315074,&quot;publication_name&quot;:&quot;Artificial Intelligence Made Simple&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Pfon!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77504fa0-0f08-4a38-bbde-becb151d2db8_643x644.png&quot;,&quot;title&quot;:&quot;The Great Compute Re-Architecture: Why Branching &amp; Sparsity Will Define the Next Decade of Silicon [Breakdowns]&quot;,&quot;truncated_body_text&quot;:&quot;It takes time to create work that&#8217;s clear, independent, and genuinely useful. If you&#8217;ve found value in this newsletter, consider becoming a paid subscriber. It helps me dive deeper into research, reach more people, stay free from ads/hidden agendas, and supports my crippling chocolate milk addiction.&quot;,&quot;date&quot;:&quot;2025-06-19T01:36:32.021Z&quot;,&quot;like_count&quot;:57,&quot;comment_count&quot;:17,&quot;bylines&quot;:[{&quot;id&quot;:8101724,&quot;name&quot;:&quot;Devansh&quot;,&quot;handle&quot;:&quot;chocolatemilkcultleader&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48081c70-8afa-41e3-a44e-b0f917bc7577_1200x1600.jpeg&quot;,&quot;bio&quot;:&quot;The best meme-maker in Tech. Writer on AI, Software, and the Tech Industry. Currently in NYC Come say hi, I want more friends. &quot;,&quot;profile_set_up_at&quot;:&quot;2021-08-21T20:28:53.612Z&quot;,&quot;reader_installed_at&quot;:&quot;2024-03-11T12:27:10.271Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:1274217,&quot;user_id&quot;:8101724,&quot;publication_id&quot;:1315074,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:true,&quot;publication&quot;:{&quot;id&quot;:1315074,&quot;name&quot;:&quot;Artificial Intelligence Made Simple&quot;,&quot;subdomain&quot;:&quot;artificialintelligencemadesimple&quot;,&quot;custom_domain&quot;:&quot;www.artificialintelligencemadesimple.com&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Covering the important ideas in AI from all angles- technical, social, and economic. Read in over 200 countries.  Useful to everyone who wants to learn AI. Critical to anyone trying to see what happens next. Sister Publication to Tech Made Simple.&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77504fa0-0f08-4a38-bbde-becb151d2db8_643x644.png&quot;,&quot;author_id&quot;:8101724,&quot;primary_user_id&quot;:8101724,&quot;theme_var_background_pop&quot;:&quot;#009B50&quot;,&quot;created_at&quot;:&quot;2023-01-14T23:37:24.692Z&quot;,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Devansh&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;magaziney&quot;,&quot;is_personal_mode&quot;:false,&quot;logo_url_wide&quot;:null}},{&quot;id&quot;:109622,&quot;user_id&quot;:8101724,&quot;publication_id&quot;:108704,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:108704,&quot;name&quot;:&quot;Technology Made Simple&quot;,&quot;subdomain&quot;:&quot;codinginterviewsmadesimple&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Deep yet digestible insights about Computer Science, Programming Interviews, Software Engineering Careers, Machine Learning, and the Tech Industry for Tech Leaders. Amazing For Coders and Managers. Beneficial to anyone trying to make money in Tech. &quot;,&quot;logo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/8546dc69-af46-4d5d-9a80-b66cb76c833b_644x644.png&quot;,&quot;author_id&quot;:8101724,&quot;primary_user_id&quot;:null,&quot;theme_var_background_pop&quot;:&quot;#45D800&quot;,&quot;created_at&quot;:&quot;2020-10-07T10:47:41.199Z&quot;,&quot;email_from_name&quot;:&quot;Devansh from Tech Made Simple&quot;,&quot;copyright&quot;:&quot;Devansh&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;magaziney&quot;,&quot;is_personal_mode&quot;:false,&quot;logo_url_wide&quot;:null}},{&quot;id&quot;:5366623,&quot;user_id&quot;:8101724,&quot;publication_id&quot;:5261101,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:5261101,&quot;name&quot;:&quot;What's Happening In Tech&quot;,&quot;subdomain&quot;:&quot;whatishappeningintechnology&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;A Newsletter meant to Help People Keep Up With What's Happening in Tech&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff955b89-d08e-4cb7-8add-709e6dc14d8e_1080x1080.jpeg&quot;,&quot;author_id&quot;:8101724,&quot;primary_user_id&quot;:null,&quot;theme_var_background_pop&quot;:&quot;#FF6719&quot;,&quot;created_at&quot;:&quot;2025-06-07T04:30:33.908Z&quot;,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Devansh&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;newspaper&quot;,&quot;is_personal_mode&quot;:false,&quot;logo_url_wide&quot;:null}}],&quot;twitter_screen_name&quot;:&quot;Machine01776819&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:1000,&quot;status&quot;:{&quot;bestsellerTier&quot;:1000,&quot;subscriberTier&quot;:1,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:{&quot;type&quot;:&quot;bestseller&quot;,&quot;tier&quot;:1000},&quot;paidPublicationIds&quot;:[618139,1238074,1442076],&quot;subscriber&quot;:null}}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;,&quot;source&quot;:null}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://www.artificialintelligencemadesimple.com/p/the-great-compute-re-architecture?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!Pfon!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77504fa0-0f08-4a38-bbde-becb151d2db8_643x644.png" loading="lazy"><span class="embedded-post-publication-name">Artificial Intelligence Made Simple</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">The Great Compute Re-Architecture: Why Branching &amp; Sparsity Will Define the Next Decade of Silicon [Breakdowns]</div></div><div class="embedded-post-body">It takes time to create work that&#8217;s clear, independent, and genuinely useful. If you&#8217;ve found value in this newsletter, consider becoming a paid subscriber. It helps me dive deeper into research, reach more people, stay free from ads/hidden agendas, and supports my crippling chocolate milk addiction&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">a year ago &#183; 57 likes &#183; 17 comments &#183; Devansh</div></a></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>The plots are generated from <a href="https://avikde.github.io/tiny-xpu/">this page</a> from the TinyXPU project, which you can read more about <a href="https://chipinsights.net/p/the-art-of-architectural-analysis">here</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>For a gentle introduction to transformers with a computer architecture framing, I&#8217;d recommend <a href="https://www.viksnewsletter.com/p/a-primer-on-transformer-architecture">Vik&#8217;s article</a>.</p></div></div>]]></content:encoded></item><item><title><![CDATA[The First Paradigm in Robotics & AI Research: Lessons from Computer Engineering]]></title><description><![CDATA[Commoditization and end-to-end learning have consolidated robotics and AI. What's next for research labs?]]></description><link>https://www.avikde.me/p/the-first-paradigm-in-robotics-and</link><guid isPermaLink="false">https://www.avikde.me/p/the-first-paradigm-in-robotics-and</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Wed, 29 Apr 2026 15:13:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4P2l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Revolutions">Thomas Kuhn wrote</a> that scientific fields develop into dominant <em>paradigms</em> that characterize phases of productive but incremental research. The very existence of a paradigm is evidence to the maturation of a field.</p><p>For robotics, we may be in the midst of the first time this has ever happened.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> The start of our research careers resembled the &#8220;wild west&#8221; of emerging techniques and technologies, but ideas have converged more now. On one hand, robotic hardware has gotten good enough to see thousands of robots of getting shipped and used, by consumers and researchers alike. On the algorithm side, the bitter lesson and its corollary &#8212; hypothesized &#8220;scaling laws&#8221; &#8212; have provided a scaffolding around which progress can be evaluated. <a href="https://itcanthink.substack.com/p/vision-language-action-models-and">End-to-end behavior cloning policies</a> seem like they can generalize to all sorts of tasks, and performance predictably improves with more data. We&#8217;ll refer to these two trends as <em>commoditization</em> and <em>architectural convergence</em>, and discuss how they shape the current paradigm below.</p><p>The establishment of this current paradigm has also had side-effects on the nature of research that may in themselves be setting us up for paradigm <em>shifts</em>. While it is a bit of an overreach to use the term &#8220;revolution&#8221; for robotics (as Kuhn did for science), such a shift would be pivotal for researchers and is worth understanding.</p><p><em>This article is co-written by </em><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Avik De&quot;,&quot;id&quot;:356074997,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;uuid&quot;:&quot;74d7f96b-7849-4218-ad74-d6ae4e18d101&quot;}" data-component-name="MentionToDOM"></span> <em>and </em><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Chris Paxton&quot;,&quot;id&quot;:232680664,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!13Dp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a886fd-347d-4694-b670-0253975d2ba9_659x547.png&quot;,&quot;uuid&quot;:&quot;f2090487-de58-433d-99ed-65a4350be474&quot;}" data-component-name="MentionToDOM"></span><em>, both robotics researchers with experience in academia as well as industry. Chris writes about AI and robotics, and Avik writes about robotics, computing, and AI.</em></p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:7287367,&quot;name&quot;:&quot;min{power}&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;base_url&quot;:&quot;https://www.avikde.me&quot;,&quot;hero_text&quot;:&quot;Explorations in computing and robotics focused on power-efficiency and safety -- personal posts by Avik De, robotics Ph.D. and founder&quot;,&quot;author_name&quot;:&quot;Avik De&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#ffffff&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://www.avikde.me?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png" width="56" height="56" style="background-color: rgb(255, 255, 255);"><span class="embedded-publication-name">min{power}</span><div class="embedded-publication-hero-text">Explorations in computing and robotics focused on power-efficiency and safety -- personal posts by Avik De, robotics Ph.D. and founder</div><div class="embedded-publication-author-name">By Avik De</div></a><form class="embedded-publication-subscribe" method="GET" action="https://www.avikde.me/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:2883266,&quot;name&quot;:&quot;It Can Think!&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!13Dp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a886fd-347d-4694-b670-0253975d2ba9_659x547.png&quot;,&quot;base_url&quot;:&quot;https://itcanthink.substack.com&quot;,&quot;hero_text&quot;:&quot;Robotics and AI; the future we're building and how we'll get there&quot;,&quot;author_name&quot;:&quot;Chris Paxton&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#292524&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://itcanthink.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!13Dp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a886fd-347d-4694-b670-0253975d2ba9_659x547.png" width="56" height="56" style="background-color: rgb(41, 37, 36);"><span class="embedded-publication-name">It Can Think!</span><div class="embedded-publication-hero-text">Robotics and AI; the future we're building and how we'll get there</div><div class="embedded-publication-author-name">By Chris Paxton</div></a><form class="embedded-publication-subscribe" method="GET" action="https://itcanthink.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><h2>Trends in Robotics and AI</h2><h3>1) Commoditization</h3><p>Going back to 2013, Avik&#8217;s Ph.D. research included the development of an internal research robot, Minitaur:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4P2l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4P2l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 424w, https://substackcdn.com/image/fetch/$s_!4P2l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 848w, https://substackcdn.com/image/fetch/$s_!4P2l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 1272w, https://substackcdn.com/image/fetch/$s_!4P2l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4P2l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png" width="515" height="366.31799163179915" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:680,&quot;width&quot;:956,&quot;resizeWidth&quot;:515,&quot;bytes&quot;:838745,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/194565767?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4P2l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 424w, https://substackcdn.com/image/fetch/$s_!4P2l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 848w, https://substackcdn.com/image/fetch/$s_!4P2l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 1272w, https://substackcdn.com/image/fetch/$s_!4P2l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It took a lot of (Ph.D. student) effort to build the infrastructure, but resulted in a unique development platform that was easy to program (as an Arduino), lightweight and relatively safe (5 kg), and capable of producing very agile and exciting-looking behaviors. There was nothing like it that you could buy. All in all, this endeavor to develop a new robot led to <a href="https://www.avikde.me/p/vertical-hopper-compositions">papers</a>, cool movies to show in talks, and even <a href="https://www.ghostrobotics.io/">a startup company</a>.</p><p>In the decade after, four-legged robots started to get out of the research lab and into public consciousness. The show Silicon Valley had a <a href="https://www.businessinsider.com/silicon-valley-google-spot-robot-2016-4">Boston Dynamics Spot cameo in 2016</a>, and robot videos designed to appeal to a broad audience, like <a href="https://www.youtube.com/watch?v=kHBcVlqpvZ8">dancing</a>, started to appear. Four-legged robots were officially out of the lab and in the wild, and this led to increased expectations for what they should do. Stably walking around used to be cutting edge, but became table stakes. Expectations for specs such as reliability, battery life, compute capability, ruggedness drove  designs to be more complex. It became much more difficult for a couple of researchers with minimal engineering experience to put together a new robot. Moreover, after Chinese company Unitree entered the market and <a href="https://kr-asia.com/unitree-robotics-develops-personal-robot-dogs-that-jog-alongside-you">dropped the asking price by almost 30x</a> in 2021, it became not worth the time and dollars to even try.</p><p><strong>The pre-paradigm period of lab-developed robotic hardware is being replaced by algorithm development for commoditized hardware.</strong></p><p>We have seen this play out in several robotics research labs. DJI commoditized consumer drones aggressively from 2013 onward, making it hard to justify custom builds even for capability reasons. By the mid-2010s, labs doing serious flight research (e.g., <a href="https://rpg.ifi.uzh.ch/people_scaramuzza.html">Davide Scaramuzza&#8217;s group</a> at University of Zurich) were exclusively using commercial platforms. ETH Zurich&#8217;s <a href="https://rsl.ethz.ch/research/researchtopics/legged-locomotion.html">Robotic Systems Lab</a> (which built ANYmal originally, and also STarLETH) now deploys their locomotion research on the ANYmal platform rather than building new hardware. <a href="https://bostondynamics.com/blog/what-makes-an-effective-research-robot/">Boston Dynamics has an article</a> that talks about how commercial platforms let researchers hit the ground running.</p><p>Post-commoditization, researchers who want to demonstrate <em>algorithms</em> working on robots can reap the benefits. Humanoid research circa 2015 meant figuring out &#8220;how do we actually build these things and make them not fall over,&#8221; whereas post-commoditization, time can be spent on higher-level algorithms and methods &#8212; we refer to this phenomenon as &#8220;<strong>moving up the stack</strong>.&#8221;</p><p>A secondary effect of commoditization is that <em>parts</em> are now easier to get, and researchers can put together novel modular combinations of more mature components. The WidowX 250 Dynamixel-based arm from Trossen Robotics has become the default low-cost manipulation platform because it is cheap (~$3k) and can be used to create &#8220;leader-follower&#8221; setups for data collection. The <a href="https://arxiv.org/abs/2304.13705">ALOHA paper</a> notes that the whole system with two arms costs ~$20k off-the-shelf. More recently, we have seen <a href="https://yourownrobot.ai/">robots like the YOR</a> assembled from off-the-shelf parts for research purposes. This effect enables new types and form-factors of robots to be built &#8212; <em>we will return to this in the next section</em>.</p><p>The same trend applies to non-hardware <strong>AI research</strong>. Frontier language models cannot really be trained by academic research labs any more &#8212; research in these areas moves to fine-tuning commercial models instead. The following plots <a href="https://github.com/avikde/robo-research-trends">were generated from arXiv data</a> and confirm these trends toward pretrained model usage in research compared to building them from scratch.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ry1_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ry1_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 424w, https://substackcdn.com/image/fetch/$s_!Ry1_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 848w, https://substackcdn.com/image/fetch/$s_!Ry1_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 1272w, https://substackcdn.com/image/fetch/$s_!Ry1_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ry1_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png" width="384" height="288" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:600,&quot;resizeWidth&quot;:384,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ry1_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 424w, https://substackcdn.com/image/fetch/$s_!Ry1_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 848w, https://substackcdn.com/image/fetch/$s_!Ry1_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 1272w, https://substackcdn.com/image/fetch/$s_!Ry1_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VfeU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VfeU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 424w, https://substackcdn.com/image/fetch/$s_!VfeU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 848w, https://substackcdn.com/image/fetch/$s_!VfeU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 1272w, https://substackcdn.com/image/fetch/$s_!VfeU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VfeU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png" width="388" height="291" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:600,&quot;resizeWidth&quot;:388,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VfeU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 424w, https://substackcdn.com/image/fetch/$s_!VfeU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 848w, https://substackcdn.com/image/fetch/$s_!VfeU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 1272w, https://substackcdn.com/image/fetch/$s_!VfeU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The Qwen series of models by Alibaba have nearly taken over the research world by facilitating fine-tuning. In 2026, no academics would think of training their own language models or even vision-language models from scratch &#8212; why would you, when Qwen 3.5 can already beat anything that&#8217;s within reach of an ordinary academic lab?</p><p>Just like for robotics hardware, <strong>the pre-paradigm period of lab-developed models is being replaced by fine-tuning commercial models</strong>.</p><p>Here as well, there are research ideas which can be pursued by <strong>moving up the stack</strong>: agentic reasoning, reinforcement learning, world representations, novel model architectures, etc. Robotics models are not like language models; there are fewer real world benchmarks and it seems that even within the domain of end-to-end deep learning there are plenty of ideas left unexplored.</p><h3>2) Architectural Convergence</h3><p>Labs used to have a narrower focus where they could carve their niche, e.g. computer vision, legged locomotion, etc. However, for a robot to demonstrate complex sensorimotor tasks, you need <a href="https://open.substack.com/pub/minpower/p/the-architecture-behind-end-to-end">all of the Sense-Plan-Act functions implemented in some way</a>. If you subscribe to the bitter lesson, even the best computer vision algorithm, when connected using hand-crafted interfaces to a planner and other downstream systems, cannot compete with end-to-end systems. General-purpose manipulation / locomotion research is <a href="https://itcanthink.substack.com/p/interesting-directions-in-vision">converging on behavior cloning and VLAs</a> since it works well enough across many tasks, and performance improves with larger models and more data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qAhH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qAhH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 424w, https://substackcdn.com/image/fetch/$s_!qAhH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 848w, https://substackcdn.com/image/fetch/$s_!qAhH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 1272w, https://substackcdn.com/image/fetch/$s_!qAhH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qAhH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png" width="1456" height="934" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:934,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qAhH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 424w, https://substackcdn.com/image/fetch/$s_!qAhH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 848w, https://substackcdn.com/image/fetch/$s_!qAhH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 1272w, https://substackcdn.com/image/fetch/$s_!qAhH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Behavior cloning with VLAs (source: <a href="https://www.pi.website/research/human_to_robot">Physical Intelligence</a>)</figcaption></figure></div><p>This trend has pushed many previously-diverse labs toward developing end-to-end models, which is a significant reduction in the diversity and richness in the research ecosystem. For better or worse, we appear to solidly be in a <strong>paradigm of behavior cloning with end-to-end models</strong>.</p><p>This has several benefits for researchers: they can build on existing work easily without re-inventing the wheel, and it creates a scaffolding for new contributions. However, it also has the side-effect of suppressing other schools of thought. In Kuhn&#8217;s somewhat ominous words,</p><blockquote><p>But there are always some men who cling to one or another of the older views, and they are simply read out of the profession, which thereafter ignores their work. The new paradigm implies a new and more rigid definition of the field. Those unwilling or unable to accommodate their work to it must proceed in isolation or attach themselves to some other group.</p></blockquote><p>How do research labs and out-of-paradigm ideas stand out in the face of homogenization and consolidation in this paradigm? We discuss what we can learn from computer engineering in the next section.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h2>What we can learn from Computer Engineering</h2><p>By necessity, computer engineering has always been a bit ahead of the same technology curve as robotics. After all, we needed the chips to facilitate computations needed for robots to work.</p><p>We saw there a similar <strong>commoditization</strong> trend, with hardware complexity outgrowing what a research lab could build. The initial university fab era was anchored by <a href="https://en.wikipedia.org/wiki/VLSI_Project">DARPA&#8217;s VLSI Project</a>, which produced BSD Unix, the RISC concept, and MOSIS (a shared fab for academia). Once that era ended, academic research pivoted to what could be done without a fab.</p><p>As a response, computer engineering therefore shows a good set of examples of <strong>moving up the stack</strong> (transistors &#8594; meta-design tools and ISAs). Circa 2010, rather than building chips, Krste Asanovi&#263;&#8217;s group at Berkeley <a href="https://people.eecs.berkeley.edu/~krste/papers/EECS-2014-146.pdf">designed the open RISC-V ISA</a> explicitly motivated by the problem of proprietary architectures impeding academic research. With <a href="https://github.com/chipsalliance/chisel">Chisel</a> (Berkeley), academics built better tools for designing chips, by expressing hardware designs in a high-level language, and it became the foundation for most RISC-V implementations.</p><p>In addition, CPU architectures converged to x86 for desktop and ARM for mobile because they worked well enough for most workloads, and design costs could be amortized across different applications &#8212; a <strong>general-purpose computing paradigm</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ncaw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ncaw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 424w, https://substackcdn.com/image/fetch/$s_!Ncaw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 848w, https://substackcdn.com/image/fetch/$s_!Ncaw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 1272w, https://substackcdn.com/image/fetch/$s_!Ncaw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ncaw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png" width="583" height="393.6105675146771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:690,&quot;width&quot;:1022,&quot;resizeWidth&quot;:583,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ncaw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 424w, https://substackcdn.com/image/fetch/$s_!Ncaw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 848w, https://substackcdn.com/image/fetch/$s_!Ncaw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 1272w, https://substackcdn.com/image/fetch/$s_!Ncaw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Performance saturation from the end of Dennard scaling (source: H&amp;P 2017 lecture).</figcaption></figure></div><p><a href="https://dl.acm.org/doi/10.1145/3282307">Hennessy and Patterson&#8217;s 2017 Turing Award lecture</a> argued that the post-Dennard-scaling era opens up a new window for research in domain-specific accelerators, where the design space is exploratory again. Coincident with the success of deep neural networks, the last few years have seen a <a href="https://thechipletter.substack.com/p/ai-accelerators-the-cambrian-explosion">Cambrian explosion in AI accelerators</a>, ushering in much more innovation in computer architecture and silicon than was possible in CPUs.</p><p>In other words, computer engineering&#8217;s paradigm shift resulted in <strong>domain-specific diversification.</strong></p><p>How do these apply to robotics and AI?</p><p>Just as chip fabrication leaving academia didn&#8217;t end computer architecture research, robotics research will find a home in core algorithms, training methodologies, and novel architectures<strong>.</strong> While papers can continue to be written on new methods and algorithms, unfortunately, the flashy demonstrations (important for fundraising and PR) may go out of lab reach. Similar to how ChatGPT capitalized on published transformer research, companies will capitalize on published public-domain research. It may become crucial to have a credit mechanism for academics for commercial usage of their work (this is not covered by academic metrics such as h-index).</p><p>The largest robotics companies are converging on general-purpose humanoids, optimizing for the broadest possible applicability and commercial value. By analogy to computer engineering&#8217;s <strong>domain-specific diversification</strong>, the next productive frontier for academic labs may be task-specific robots: surgical, agricultural, soft robots, etc., which diverge enough from general-purpose designs to make bespoke solutions worthwhile. A positive side-effect of the commoditization of hardware components (like actuators, IMUs, perception systems like the Kinect) all come together to facilitate this kind of development.</p><h2>The Future</h2><p>While the external perception of robotics and AI research is that we are undergoing a revolution today, the internal view is more consistent with <em>commoditization</em> and <em>convergence</em>. This paradigm has had a lot of positive side-effects, like establishing a framework and shared infrastructure, but also some serious downsides, like stifling research that doesn&#8217;t fit the mold. </p><p>In response, we already see the reality of robotics research <strong>moving up the stack</strong>, and we will potentially begin to see examples of <strong>domain-specific diversification</strong> if the largest companies with the largest datasets corner the end-to-end behavior cloning approach.</p><p>Beyond that, it&#8217;s too early to predict if there is a paradigm shift coming. Kuhn says on this topic:</p><blockquote><p>Sometimes a normal problem, one that ought to be solvable by known rules and procedures, resists the reiterated onslaught of the ablest members of the group within whose competence it falls. On other occasions a piece of equipment designed and constructed for the purpose of normal research fails to perform in the anticipated manner, revealing an anomaly that cannot, despite repeated effort, be aligned with professional expectation.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YrAc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YrAc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 424w, https://substackcdn.com/image/fetch/$s_!YrAc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 848w, https://substackcdn.com/image/fetch/$s_!YrAc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 1272w, https://substackcdn.com/image/fetch/$s_!YrAc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YrAc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png" width="233" height="229" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:229,&quot;width&quot;:233,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YrAc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 424w, https://substackcdn.com/image/fetch/$s_!YrAc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 848w, https://substackcdn.com/image/fetch/$s_!YrAc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 1272w, https://substackcdn.com/image/fetch/$s_!YrAc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Kuhn cycle (<a href="https://www.thwink.org/sustain/glossary/KuhnCycle.htm">source</a>)</figcaption></figure></div><p>Will there be a &#8220;piece of equipment&#8221; or &#8220;normal problem&#8221; whose unexpected result paves the way for the next robotics revolution? Optimistically, it seems like the current paradigm still has legs for a little while longer, but there is already work at the fringes looking toward the next set of leaps, like world model research, neuromorphic computing, etc. We&#8217;ll be writing about these topics over the coming weeks and months; stay tuned!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><p><em>If you enjoyed this post, please like (&#10084;&#65039;) and restack &#8212; it helps others find my writing. Subscribe to receive new posts. All of this is greatly appreciated.</em></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>While originally intended for scientific fields, the <a href="https://www.sciencedirect.com/science/article/abs/pii/0048733382900166">idea has been extended</a> to broader technological fields.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Honor's humanoid ran the fastest half-marathon: how did they do it?]]></title><description><![CDATA[Engineering isn't magic, it's a matter of tradeoffs]]></description><link>https://www.avikde.me/p/honors-humanoid-ran-the-fastest-half</link><guid isPermaLink="false">https://www.avikde.me/p/honors-humanoid-ran-the-fastest-half</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Wed, 22 Apr 2026 20:11:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!S69N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Robotics headlines over the past week have been dominated by the news that the <a href="https://www.cnn.com/2026/04/19/china/china-robot-half-marathon-intl-hnk">Honor Lightning humanoid robot has beaten the human half marathon world record</a> for the first time. It&#8217;s important to remember that machines and humans have very different capabilities and constraints, so why should we ever have expected the half marathon time for a robot and human to be related? Down the line, I don&#8217;t expect this particular comparison of human to machine to be very relevant. Nevertheless, it&#8217;s still an important milestone for engineering, just like <a href="https://en.wikipedia.org/wiki/Deep_Blue_versus_Garry_Kasparov">Deep Blue&#8217;s 1997 defeat of Garry Kasparov in chess</a>. From a human standpoint, I hope we can resist comparing the accomplishments of machines to the well-earned and deserved achievements of humans&#8230; maybe the chess model is a reasonable one here. Also as in the chess case, where Deep Blue couldn&#8217;t physically move the pieces, the Honor robot&#8217;s capabilities are much more narrow than a human running elbow-to-elbow with other runners, effortlessly navigating the course without GPS, etc. Comparing the robot runner to a human runner is just an apples to oranges comparison.</p><p>What <em>is</em> a good comparison is this performance to last year&#8217;s, when the best robot time was over 160 minutes, or more than 3x this year&#8217;s time. That&#8217;s a remarkable improvement in one year. My doctoral thesis involved <a href="https://www.avikde.me/p/phd-defense">building and controlling hopping and running robots</a>, and <a href="https://www.avikde.me/p/ghost-robotics-minitaur">since then I&#8217;ve tried to design and build efficient commercial legged robots</a>, giving me a decent idea of the constraints involved. So, in this article I wanted to try and examine &#8212; how did they do it? Is there some magical technology or technique that unlocked this performance? How did they beat the significantly better-known Unitree (who reportedly had to supply an <a href="https://x.com/TheHumanoidHub/status/2045702643449037287">ice pack backpack</a> to try and complete the race without overheating)? Could a western robot have won?</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><em>This publication and this post contain the author&#8217;s personal thoughts and opinions only, and do not reflect the views of any companies or institutions.</em></p><h2>The basic physics of hopping and running</h2><p>Hopping, very simply, consists of alternating phases of a leg pushing against the ground (&#8220;stance phase&#8221;) and the body flying through the air (&#8220;aerial phase&#8221;).</p><p>In aerial phase, the body simply free-falls (constant acceleration due to gravity). You can think of this as losing vertical momentum. In stance phase, the job of the leg is to push against the ground to reverse this vertical momentum. The job of the &#8220;knee&#8221; actuator is primarily to generate this force in stance phase.</p><p>The other basic leg function is repositioning for the next foothold. In bipedal running, while one leg is pushing against the ground, the other leg is swinging to reposition for the next step. The job of the &#8220;hip&#8221; actuator is primarily to swing the leg forward.</p><p>Bipedal running is simply these two functions alternating in the two legs &#8212; while the left leg pushes against the ground, the right leg swings forward, and vice versa. Of course, this is an oversimplification in many ways, but it still captures the main effects that contribute to running energetics. Namely, it becomes clear that:</p><ul><li><p>the knee actuator must produce enough torque to reverse the entire robot momentum in the stance duration <em>T<sub>s</sub></em></p></li><li><p>the hip actuator must product enough power to accelerate the leg forward in the swing duration <em>T<sub>sw</sub></em></p></li></ul><p>The way a robot runs faster is that it increases its stride length and/or shortens the stance duration.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RCSB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RCSB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 424w, https://substackcdn.com/image/fetch/$s_!RCSB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 848w, https://substackcdn.com/image/fetch/$s_!RCSB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 1272w, https://substackcdn.com/image/fetch/$s_!RCSB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RCSB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png" width="494" height="277.5809523809524" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:354,&quot;width&quot;:630,&quot;resizeWidth&quot;:494,&quot;bytes&quot;:44169,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/194835386?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RCSB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 424w, https://substackcdn.com/image/fetch/$s_!RCSB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 848w, https://substackcdn.com/image/fetch/$s_!RCSB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 1272w, https://substackcdn.com/image/fetch/$s_!RCSB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A depiction of a single-leg hopper&#8217;s stance phase showing the reversal of vertical momentum and the maintenance of horizontal momentum, as well as the stride length and the stance duration. Source: &#8220;Legged Robots That Balance&#8221;.</figcaption></figure></div><p>Shortening the stance duration requires a higher amount of knee torque to be needed to accomplish the same momentum reversal. Swinging the leg faster, and covering a longer stride length requires more torque and power from the hip actuator.</p><p>And just like that, with very basic physics, we&#8217;ve recovered the dependence of running speed on the torque and power produced by the actuators.</p><h2>The basic physics of motors</h2><p>Electric motors dissipate energy in an exact relation to the amount of torque they produce, and these quantities are related by an appropriately-named constant termed the <em>motor constant</em>, <em>K<sub>m</sub></em>. If <em>&#964; </em>is the torque produced by the motor and <em>Q</em> is the heat it produces,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;K_m := \\frac{\\tau}{\\sqrt{Q}}&quot;,&quot;id&quot;:&quot;FTAAGSQRRL&quot;}" data-component-name="LatexBlockToDOM"></div><p>In the &#8220;New Motor Models&#8221; section in <a href="https://repository.upenn.edu/entities/publication/10b266fd-41d2-49b6-ac90-0ee614bca00a">my thesis (2017)</a> I described how a <em>K<sub>m</sub></em> scaling relation can be approximated from rough first-principles geometry arguments. In particular, for a fixed length scale, <em>K<sub>m</sub></em> scales with the square root of motor mass &#8730;<em>m</em>. In a <a href="https://robot-daycare.com/posts/actuation_series_1/">recent post</a>, longtime blogger and roboticist Ben Katz generalizes and gives this coefficient a name , the &#8220;figure of merit (FoM),&#8221; which we can use here:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathrm{FoM}:=\\frac{K_m}{r\\sqrt{m}}&quot;,&quot;id&quot;:&quot;ICOKXYEZKV&quot;}" data-component-name="LatexBlockToDOM"></div><p>The <em>r</em> above is the motor diameter. To estimate motor mass <em>m</em>, I decided to relate it to the motor diameter and (unknown) length. With these, and assuming a high but reasonable FoM of 15, we can extrapolate the likely <em>K<sub>m</sub>.</em></p><p>To estimate the rotor inertia, we can relate it to the motor mass and inertia as <em>j ~ mr<sup>2</sup></em> as Ben Katz also does.</p><p>Adding a geartrain (gear ratio <em>G</em>) after the motor amplifies its torque and reduces its speed by <em>G</em>. So, it helps with torque production, but it has a very deleterious effect in legged systems when accelerating. Since the rotor of the motor itself has to spin faster, the rotor inertia <em>j</em> in the output frame appears scaled to <em>G<sup>2</sup>j</em>, which can quickly become very large. Thus, a small motor with large gearing becomes very sluggish at accelerating its output, even if it can statically produce a large torque. This is obviously bad for the &#8220;swing phase&#8221; described above.</p><h2>The Honor Lightning&#8217;s technology</h2><p>There isn&#8217;t a technical report on this robot as far as I know, but some online articles list a few specifications. I referred to <a href="https://chinaresearchcollective.substack.com/p/honors-autonomous-humanoid-robot">this substack article</a> for this post. A couple of notes:</p><ul><li><p>This article and a few others say that the robot has 55 joints, but that is definitely a mistake. Potentially with hands (that were not equipped on these half-marathon versions) it could have 55 joints, but as deployed, they probably had closer to half as many joints.</p></li><li><p>The page also lists &#8220;Leaderdrive&#8221; as a harmonic reducer technology partner implying that strain wave gearing was used. However, based on the analysis below, a lower reduction-ratio planetary or another type of gearing is more appropriate, especially for this kind of efficiency-critical application.</p></li></ul><h3>Actuation: motor, gearing, gait</h3><p>These three factors are all interrelated and have an effect on how much energy is required and how much heat is produced. To see how, let&#8217;s start with the motor.</p><p>Typically, the motor <em>K<sub>m</sub></em> can be found in the datasheet, but in this case there&#8217;s no public reporting on the motor specs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S69N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S69N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 424w, https://substackcdn.com/image/fetch/$s_!S69N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 848w, https://substackcdn.com/image/fetch/$s_!S69N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 1272w, https://substackcdn.com/image/fetch/$s_!S69N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S69N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png" width="860" height="573" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:573,&quot;width&quot;:860,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:683697,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/194835386?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!S69N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 424w, https://substackcdn.com/image/fetch/$s_!S69N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 848w, https://substackcdn.com/image/fetch/$s_!S69N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 1272w, https://substackcdn.com/image/fetch/$s_!S69N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The Honor Lightning robot. Source: CNN.</figcaption></figure></div><p>However, we can see the size of the fairly large hip/knee motors attached to the upper leg (my rough estimation is that the outer diameter is somewhere between 110-150mm from the image above). We can look at a couple of potential options: first, a reasonable 115mm diameter catalog motor, which I chose from TQ&#8217;s frameless motor catalog for similar reasons to Ben Katz&#8217;s blog post &#8212; they are well-documented and have a large selection. Second, we can use the scaling principles to make some reasonably good approximations of <em>K<sub>m</sub></em> for a hypothetical larger motor. I extrapolated to a 150x25 sized motor to obtain a <em>K<sub>m</sub></em> of 1.52 Nm/sqrt(W), and a mass of almost 2 kg.</p><p>Since we don&#8217;t know the gear ratio, we can use our simple physics model (script linked in references below) to estimate the power consumption for running for the &#8220;small&#8221; and &#8220;big&#8221; motors above as a function of <em>G</em>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DT76!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DT76!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 424w, https://substackcdn.com/image/fetch/$s_!DT76!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 848w, https://substackcdn.com/image/fetch/$s_!DT76!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 1272w, https://substackcdn.com/image/fetch/$s_!DT76!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DT76!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png" width="380" height="314.25219941348973" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:564,&quot;width&quot;:682,&quot;resizeWidth&quot;:380,&quot;bytes&quot;:85346,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/194835386?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DT76!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 424w, https://substackcdn.com/image/fetch/$s_!DT76!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 848w, https://substackcdn.com/image/fetch/$s_!DT76!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 1272w, https://substackcdn.com/image/fetch/$s_!DT76!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Note that:</p><ul><li><p>A high gear ratio is nice to minimize the power in the knee actuator (since its job of supporting the robot weight is made easier with mechanical advantage), but a high gear ratio also makes the leg swing energetically difficult.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> There&#8217;s usually a middle-ground optimum.</p></li><li><p>The larger motor (150x25) prefers a smaller gear ratio (~23:1), and the smaller motor (115x25) prefers a higher gear ratio (~40:1).<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> These are shown with the dashed gray lines.</p></li><li><p>Both of these options seem to be able to accomplish the basic push-ground and leg-swing functions, with modest robot power consumption of 400-500 W.</p></li></ul><p>So, in sum, the <strong>motor is not magical technology</strong> and in fact, a range of existing or projected options would work, <em>when appropriately sized for this task.</em><strong> </strong>I&#8217;ll get back to this last bit and the green lines in the plot later.</p><p>The dissipated knee power (which is typically the main thermal limiting factor) is ~150W for both solutions. This is almost an unavoidable consequence &#8212; due to the predictable scaling of motor <em>K<sub>m</sub></em>, running at human speeds with a humanoid-sized robot will inevitably generate this amount of heat!</p><p>This, finally, is where we would see a potentially large difference between the two motors. Motor cooling is affected by the surface area over which heat removal can occur, and the larger motor has 70% more surface area. Even so, over a prolonged period, 150W is a large amount of power to dissipate from a single motor, and this is where one of the stated innovations in this robot design appear to be coming to bear (<a href="https://eu.36kr.com/en/p/3775418378027520">source</a>):</p><blockquote><p>According to Honor, the liquid - cooling pipes penetrate deep into the motors like capillaries. The high - power liquid pump has a heat - exchange flow rate of more than 4 liters per minute. Each of the four drive motors in the lower limbs is equipped with an independent liquid - cooling circuit.</p></blockquote><p>Liquid cooling is not new, but it&#8217;s definitely not what I would call a commodity. It has shown up in research periodically, and on the commercial side <a href="https://apptronik.com/news-collection/apptronik-readies-its-humanoid-robot-for-a-summer-unveil">Apptronik tried it for a few of their prototypes</a> but (to my knowledge) does not use it on their main Apollo platform. While it definitely is not magical technology, it has been niche so far. As described above, it is absolutely essential (and so far quite challenging) to be able to dissipate ~150W from a motor for running at these speeds. From that respect, the <strong>liquid cooling tech is a key enabler</strong> of this type of performance.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p><strong>Caveat: </strong>The script I used to generate the plots above makes a lot of simplifying approximations. It doesn&#8217;t capture the energy dissipated in other motors (arms, ankles, abduction, etc.). The basic physics principles don&#8217;t lie about the periodic center-of-mass behavior, but this doesn&#8217;t model other oscillations in the orientation as the body sways etc., or losses like friction or air resistance. The inertia of the leg is left out of the swing inertia calculation, since there is no way to approximate it properly with the information available. Published materials emphasize a lightweight leg construction, which indicates that the rotor reflected inertia will likely dominate it (and so the script&#8217;s approximation is likely good). There are more accurate ways to estimate the swing energetics incorporating the leg kinematics and swing trajectory, but I wanted to not increase the complexity of this analysis and chose to err on the side of simplicity. Still, I think the main estimates and talking points (motor / gearing selection for the knee motor, and power dissipated in it) can be trusted.</p><h3>AI and autonomy</h3><p>There&#8217;s nothing to write home about here. The gait controller could have used either a reinforcement learning (RL) controller, which is easy to train for flat ground, or a model-based controller. The autonomous navigation system used a provided GNSS system and just had to follow the route waypoints. This is all very well-understood technology.</p><h3>Battery</h3><p>Let&#8217;s assume that the battery was chosen to last 1.5 hrs (the robot finished in &lt; 1 hr). For 600 W consumption (based on the figures above with some buffer), the battery would have had to have 900 Wh capacity, and at 300 Wh/kg energy density, the pack would have weighed 3 kg. This is well within reason for a 45 kg robot. Additionally, a 1.5 hour discharge time indicates a 1/1.5 or 0.67C discharge, which is well within the ratings of most existing batteries.</p><p>The Unitree H1 reportedly needed &#8220;<a href="https://www.instagram.com/p/DXZV1x9DEAp/">pit stops</a>&#8221; and battery cooling ice, indicating that it was consuming much higher power. We&#8217;ll talk about that more next.</p><h2>Engineering always involves tradeoffs</h2><p>Engineering is always characterized by tradeoffs &#8212; that&#8217;s what makes it challenging but also fun. Especially today with ever-stronger AI language models, the very human skill of judgment and knowing how to made tradeoffs is much more important than the rote work of completing a design to spec.</p><p>Even with the very simple model above, it was not that complex to roughly design a drivetrain that is theoretically capable of this feat. Then why did the competitors in the race, including more <a href="https://www.forbes.com/sites/johnkoetsier/2026/01/09/top-10-humanoid-robot-companies-by-shipments-revealed/">established and widely-shipped humanoids</a> such as from Unitree or Agibot, not compete as well?</p><p>We can use the simple model to generate an equivalent energetics plot for walking at 1.5 m/s, a much more modest but potentially more common activity for a commercial humanoid robot:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Gxy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Gxy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 424w, https://substackcdn.com/image/fetch/$s_!5Gxy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 848w, https://substackcdn.com/image/fetch/$s_!5Gxy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 1272w, https://substackcdn.com/image/fetch/$s_!5Gxy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Gxy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png" width="422" height="351.45671641791046" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:558,&quot;width&quot;:670,&quot;resizeWidth&quot;:422,&quot;bytes&quot;:84359,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/194835386?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Gxy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 424w, https://substackcdn.com/image/fetch/$s_!5Gxy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 848w, https://substackcdn.com/image/fetch/$s_!5Gxy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 1272w, https://substackcdn.com/image/fetch/$s_!5Gxy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The gray lines are as before &#8212; gear ratios optimized for half-marathon running. The green lines are where the power is minimized for walking, and they are significantly different!</p><p>Let&#8217;s say you design your robot to excel at the normal walking task and chose the green gear ratios. The knee motor power to run a half marathon with that green design consumes &gt; 300 W, more than 2x what we had with the running-optimized gray designs. It wouldn&#8217;t be so surprising to need ice packs!</p><p>Conversely, the running-optimized gray design, when used for the walking task, wastes significantly more motor power than the green designs (as seen from where they intersect the blue curves). We couldn&#8217;t model this effect with the information available, but using larger motors sized for running also increases the weight of the robot and constantly wastes power when it isn&#8217;t running at full speed. You can visually see the difference in motor sizes between the Unitree H1 and Honor Lightning:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kC_0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kC_0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 424w, https://substackcdn.com/image/fetch/$s_!kC_0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 848w, https://substackcdn.com/image/fetch/$s_!kC_0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 1272w, https://substackcdn.com/image/fetch/$s_!kC_0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kC_0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png" width="1456" height="1285" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1285,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2183576,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/194835386?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kC_0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 424w, https://substackcdn.com/image/fetch/$s_!kC_0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 848w, https://substackcdn.com/image/fetch/$s_!kC_0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 1272w, https://substackcdn.com/image/fetch/$s_!kC_0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The larger motors will have all sorts of practical (if not fundamental) consequences like bumping into objects while operating in homes or factories.</p><h2>Closing thoughts</h2><p>What should we conclude from Honor&#8217;s accomplishment? First, the capillary motor cooling solution, if mass manufacturable, is a genuine advance, and I suspect this running pace would not have been sustainable without it. Second, even if there wasn&#8217;t any &#8220;magic&#8221; needed, this was a really impressive engineering effort and result. For better or worse, it deserves to be a landmark tantamount to Deep Blue v. Kasparov.</p><p>Having said that, I don&#8217;t believe this says anything at all about human half-marathon performances. It doesn&#8217;t even imply that a humanoid robot could join a race among a sea of people without GPS and resiliently finish the race. I wish those comparisons would be left out of the press coverage.</p><p>Another thing I found interesting is that the Lightning robot was reportedly developed in about a year, between MWC in March 2025 and the April 2026 race. That is incredibly fast. However, what is even more stunning is that the R&amp;D team <a href="https://chinaresearchcollective.substack.com/p/honors-autonomous-humanoid-robot">reportedly had 2,600 people</a>. Comparing to a few US humanoid robot companies, to my knowledge, that eclipses the headcounts of Boston Dynamics, Figure, Agility, and Apptronik combined (I am not sure of Tesla&#8217;s Optimus-specific headcount). On top of that, you have to account for the partner and manufacturing ecosystem that was brought to bear, as reported by the same linked article.</p><p>Is all this worth it? It probably isn&#8217;t for most of these companies who need to spend their resources developing applications customers need and will pay for, but the cooling and weight-reduction advances may well be useful for more practical purposes like carrying heavy payloads down the line.</p><p><em>If you enjoyed this post, please like (&#10084;&#65039;) and restack &#8212; it helps others find my writing. Subscribe to receive new posts. All of this is greatly appreciated.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h2>References</h2><ul><li><p><a href="https://gist.github.com/avikde/496d108195a040763fd9b610f870d071">Script used for power estimates</a> (Github gist)</p></li><li><p><a href="https://docs.google.com/spreadsheets/d/1spBdXsc9IK0wgs-ISgCVRF1hi4WsSuF2xuNKQCzFoPk/edit?gid=0#gid=0">Spreadsheet with motor parameters and estimates</a></p></li></ul><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>This simplification makes it seem like one could just have a heavily geared knee motor and a lightly geared hip motor then, but this breaks down when you actually consider the full leg kinematics. Many of the leg joints participate in force production and swing, and one isn&#8217;t isolated to the knee motor like our cartoon might suggest. Additionally, a photo of the Honor robot really suggests that the hip and knee motors are similar if not identical. For the level of detail (and guesswork) of this article, we must assume that they are the same.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>The larger motor will also make the whole robot heavier, but we don&#8217;t have sufficient information to predict how exactly so we have to ignore this effect</p></div></div>]]></content:encoded></item><item><title><![CDATA[Building a reasoning hierarchical robotics pipeline from scratch]]></title><description><![CDATA[Part 5: A demo combining the best features of end-to-end and classical approaches]]></description><link>https://www.avikde.me/p/building-a-reasoning-hierarchical</link><guid isPermaLink="false">https://www.avikde.me/p/building-a-reasoning-hierarchical</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Tue, 07 Apr 2026 16:51:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!80Er!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>End-to-end Vision-Language-Action (VLA) models bundle perception, reasoning, and motor control into a single network, but that means the camera, kinematics, and training scenarios are all baked in together. This could cause <a href="https://www.avikde.me/debugging-as-architecture-insight">unexpected</a> and <a href="https://www.avikde.me/a-coding-agent-equivalent-for-robotics">unresolvable</a> issues when the task, embodiment, or environment change.</p><p>To showcase and demonstrate some of the insights from the past articles, I&#8217;ve put together a demonstration of the insights from this article series that you can try out, modify, and learn from. This demo combines the flexible task programming and reasoning of the Gemini ER Vision-Language-Model (what is the scene, and what should I do?) and classical camera calibration, kinematics, motion controllers.</p><p>This post describes how it is put together, goes over of some of its interesting capabilities, and the aspects of its design that directly impact its strengths and weaknesses. To conclude, we will try to compare this approach against fully modular (model-based) as well as fully end-to-end methods. The <a href="https://github.com/avikde/vla-pipeline">code is open source</a>, and I&#8217;m putting the ideas out there for discussion and feedback.</p><p><em>This article is the last part of a series on end-to-end robotics pipelines. Links to the other articles are below.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Trying out the demo</h2><p>To make it as accessible as possible, the demo runs in the browser with no software installation required, and can be accessed from your computer or even a phone. Click this button or to open the page:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://avikde.github.io/vla-pipeline/&quot;,&quot;text&quot;:&quot;Link to demo&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://avikde.github.io/vla-pipeline/"><span>Link to demo</span></a></p><p>The environment is set up for tabletop manipulation with a robot arm. The colored blocks are objects that we can instruct the arm to move, the &#8220;plates&#8221; can serve as potential goal locations, and the grey cylinders can serve as obstacles to be avoided.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!80Er!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!80Er!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 424w, https://substackcdn.com/image/fetch/$s_!80Er!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 848w, https://substackcdn.com/image/fetch/$s_!80Er!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 1272w, https://substackcdn.com/image/fetch/$s_!80Er!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!80Er!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png" width="698" height="365.6826003824092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2773110-8a13-44cc-a173-9181feb51737_1046x548.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:548,&quot;width&quot;:1046,&quot;resizeWidth&quot;:698,&quot;bytes&quot;:636439,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/193310864?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b73ae2-6c31-4e28-b0e8-1495ef5c3817_1046x760.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!80Er!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 424w, https://substackcdn.com/image/fetch/$s_!80Er!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 848w, https://substackcdn.com/image/fetch/$s_!80Er!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 1272w, https://substackcdn.com/image/fetch/$s_!80Er!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">What the <a href="https://avikde.github.io/vla-pipeline/">demo</a> scene looks like</figcaption></figure></div><p>The demo uses a Gemini Robotics ER model for task reasoning and perception. To try it out, you need to grab your own <a href="https://ai.google.dev/gemini-api/docs/api-key">Gemini API key</a> (free tier), or use the pre-baked fallback plan, which will execute the &#8220;Put the blocks away where they belong&#8221; default task. Correspondingly, click &#8220;Run Task&#8221; (with API key) or &#8220;Use Cached Task&#8221; and watch! Use the mouse to orbit the camera, and check the console for debug logs.</p><h3>What it does well</h3><p><strong>Flexible task programming and reasoning. </strong>Tasks can be prompted without needing task-specific programming, which is a major selling-point: the possible tasks are not limited by what is programmed at the factory. Gemini processes the prompt together with the scene and can break down multi-step tasks. We&#8217;ll go over how Gemini&#8217;s outputs are used by the rest of the system below.</p><p>Results from some tasks:</p><blockquote><p>Place the red block on the blue target</p></blockquote><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;a5296793-58f7-4a44-9e61-c8dd633f87f2&quot;,&quot;duration&quot;:null}"></div><p>This simple task shows the VLM&#8217;s visual and task understanding. Additionally, its language understanding can parse semantically similar words in the context of the scene (e.g. block vs. cube, or plate vs. coaster vs. target).</p><p>The video also shows the <strong>reactive obstacle avoidance</strong> allowing the arm to not collide with the cylindrical obstacles. This capability, with associated safety benefits, does not require any training or motion primitives to be built into the VLM. More on that below.</p><div><hr></div><blockquote><p>&#8220;Put the blocks on matching targets&#8221;</p></blockquote><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;be487580-1c33-4c89-acc2-549f274f5546&quot;,&quot;duration&quot;:null}"></div><p>The VLM successfully reasons that blocks go on color-matched plates, and breaks down the task into a number of steps (move red block, move blue block).</p><div><hr></div><blockquote><p>&#8220;Swap the red and blue blocks&#8221;</p></blockquote><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;71d58308-02b2-4c31-a33a-935c39af17ae&quot;,&quot;duration&quot;:null}"></div><p>This task requires a multi-step plan to move one of the blocks out of the way first, and the selection of a free location to store it.</p><p>The wireframes displayed in the animation show the <strong>spatial understanding</strong> ability built from a combination of a Gemini <strong>VLM with classical computer vision</strong>. Objects in the scene are semantically classified &#8212; into objects (blue wireframes), potential goal locations (green), and potential obstacles (black) &#8212; by the VLM guided by prompting, without hardcoding.</p><p>As a note of caution, I had a few runs where it chose the &#8220;free&#8221; location incorrectly on top of another block.</p><div><hr></div><blockquote><p>&#8220;Put away the blocks&#8221;</p></blockquote><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;8efce8e3-b589-47eb-b3a6-391097177f42&quot;,&quot;duration&quot;:null}"></div><p>The success of this (underspecified) prompt showcases the language and intent understanding of the VLM. However, I will temper with the note that in some runs, it did try to move the green block and confuse itself &#8212; feel free to <a href="https://avikde.github.io/vla-pipeline/">try it yoursel</a>f!</p><div><hr></div><blockquote><p>&#8220;Wave&#8221;</p></blockquote><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;b8000122-8753-40b5-93c0-1138407cab5e&quot;,&quot;duration&quot;:null}"></div><p>This silly task shows that the VLM&#8217;s task understanding goes beyond tabletop manipulation, as it can produce waypoints just intended for arbitrary motion. However, this demo will only successfully perform horizontal motion due to the 2-dimensional understanding of the VLM &#8212; more on that below.</p><h3>What is challenging</h3><p>The principal weaknesses are also to do with the 2-dimensional understanding of the VLM.</p><blockquote><p>&#8220;Stack the blocks&#8221;</p></blockquote><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;2bb92616-d869-49b6-a656-2f4413d73998&quot;,&quot;duration&quot;:null}"></div><p>It correctly moves multiple blocks to the same horizontal position, but does not properly reason about the vertical location of each drop-off. This results in the later blocks getting smashed into the ones already placed.</p><h2>The architecture explains strengths and weaknesses</h2><p>The architecture of the <a href="https://avikde.github.io/vla-pipeline/">demo</a> is shown below:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iYxp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iYxp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 424w, https://substackcdn.com/image/fetch/$s_!iYxp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 848w, https://substackcdn.com/image/fetch/$s_!iYxp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 1272w, https://substackcdn.com/image/fetch/$s_!iYxp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iYxp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png" width="1382" height="292" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:292,&quot;width&quot;:1382,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:62158,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/193310864?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iYxp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 424w, https://substackcdn.com/image/fetch/$s_!iYxp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 848w, https://substackcdn.com/image/fetch/$s_!iYxp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 1272w, https://substackcdn.com/image/fetch/$s_!iYxp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Gemini (VLM) blocks are blue, and blocks built using classical methods are green. Each layer is independently swappable, and the AI model doesn&#8217;t need to know anything about the robot&#8217;s embodiment. This recreates the modularity of a <a href="https://www.avikde.me/the-architecture-behind-end-to-end">Sense-Plan-Act</a> architecture while retaining the semantic reasoning of a foundation AI model.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>Vision-Language Model (VLM)</h3><p>The demo uses <a href="https://ai.google.dev/gemini-api/docs/robotics-overview">Gemini ER</a>, whose inclusion I previously motivated <a href="https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics">with a coding agent analogy</a>. Its inputs are the text prompt and a single image, and its outputs are grounded in pixels in the same image. This keeps its behavior well-defined and decoupled from the robot embodiment, solving many of the issues with <a href="https://www.avikde.me/p/debugging-as-architecture-insight">X-VLA in a similar setup</a>. </p><p>However, it builds in a few assumptions that should be acknowledged. Most importantly, its understanding of the world is decidedly planar (pixels in the image plane).<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> The view must therefore be chosen to avoid occlusion, parallax-related issues as the camera moves, and tasks that require positioning along the camera axis (like the block-stacking task above).</p><p>Gemini ER can be prompted to output structured JSON, which is easy to work with in downstream layers. The system is first prompted for &#8220;perception&#8221;, which does object detection, semantic classification, and bounding box identification. All of these are very common functions, and easy for this model to complete in ~1 second. An example output for the perception block is below:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">[
   {
      "label":"green block", // &lt;- a name
      "point":[637, 232], // &lt;- position in image coordinates
      "box_2d":[531, 157, 743, 305], // &lt;- image coordinates of bounding box
      "type":"block" // &lt;- semantic classification
   }, // ... other detections 
]</code></pre></div><p>The next step asks the model to plan the motion for a task. We specify an output format that limits the output to &#8220;<a href="https://ai.google.dev/gemini-api/docs/function-calling">calling functions</a>&#8221; that the arm and its lower-level controller is capable of executing. Example output from Gemini (took anywhere from 4-10 seconds):</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">[
   {
      "function":"move", // a function that moves the arm
      "args":[584, 753, false], // position (image coords) + 1 bit indicating height
   },
   {
      "function":"setGripperState", // a function that closes or opens the gripper
      "args":[false] // false to close, true to open
   }, // ... other steps
]</code></pre></div><p>The full prompts to get these outputs are <a href="https://github.com/avikde/vla-pipeline/blob/main/web/gemini-er.js">part of the open-source package</a>.</p><h3>Spatial understanding</h3><p>First, we convert the image-plane understanding of the VLM into spatially accurate waypoints that the arm can act on. For this conversion, I also sampled depth values from the camera location (easily reproducible with stereoscopy or a model like <a href="https://arxiv.org/abs/2511.10647">DepthAnything</a>). I chose to use the bounding boxes to isolate the depth values in a region around the object center, and use that to fit primitive shapes to the detections (rendered with wireframes in the videos above). This can be done by well-understood camera geometry transformations, and also allows for relocation of the camera, <a href="https://www.avikde.me/p/debugging-as-architecture-insight">unlike in a VLA</a> where the camera geometry is inextricably linked into the rest of the model. The output of this block is 3D waypoints and a representation of the obstacles.</p><p>The object shape affects how well a bounding box captures inlying depth pixels. Gemini also has a native ability to output segmentation masks, which could allow for further refinement in this computational block.</p><h3>Model-based local planner</h3><p>The next part is a model-based local planner that actually generates control signals. This decouples the control rate from the slow runtime of the VLM completely, and no retraining is needed to generate novel motions for new scene compositions. <a href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation">Adaptations for payload</a> could be built into this layer without affecting VLM.</p><p>For obstacle avoidance, we use a &#8220;potential field&#8221; that pushes the end-effector away from obstacles (you can see orange arrows appearing briefly in the animations above), while moving toward the desired goal. This is a classic reactive <a href="https://modernrobotics.northwestern.edu/nu-gm-book-resource/10-1-overview-of-motion-planning/">motion planning technique</a>, one of a family of well-understood algorithms along with sampling-based and grid-search planners.</p><h2>The interface is crucial</h2><p>The VLA approach had no choices to be made about the type of interface &#8212; when trained on the same embodiment end-to-end, input pixels get mapped straight through to actions. However, with this hierarchical controller, the choice of interface is quite important. While it resolved many of the drawbacks of the full end-to-end approach that <a href="https://www.avikde.me/p/debugging-as-architecture-insight">held back a demonstration like this</a>, one of the interpretations of the <a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">bitter lesson</a> is that <em>any</em> hand-crafted interface design hampers system performance.</p><p>For example, for grasp generation in this demo, we have to assume that knowing the location of block is sufficient to produce an action to grasp it. However, different grasping actions may be needed for soft or unusually-shaped objects, like eggs, cloth, etc. One possible extension to resolve this is to incorporate a grasp generation module seeded by the object centers and bounding boxes. A VLA will just directly output actions, which is not limited by this kind of architectural judgment, but also may require a lot more training data and fail unpredictably when out of distribution.</p><h2>Scoring the criteria from the first article</h2><p>The architecture described above is neither an end-to-end VLA, nor a modular model-based one. For specificity in this section, I&#8217;ll assume the former camp as being represented by something like Physical Intelligence&#8217;s models (a small version of which we <a href="https://www.avikde.me/p/debugging-as-architecture-insight">tried hands-on with X-VLA</a>), and the latter as being represented by the MIT 2014 Atlas method. Both were discussed in the <a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">first part in this series about end-to-end robotics pipelines</a>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>All that said, where does this &#8220;hybrid&#8221; hierarchical strategy fall? We identified a number of criteria in previous articles, and can try to roughly size up where each falls:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/9cRzg/3/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9bf688c-9add-4d6a-ba7a-a65ef5b80d46_1220x1114.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a286a468-da03-436a-8e59-2923b0911406_1220x1114.png&quot;,&quot;height&quot;:687,&quot;title&quot;:&quot;Scoring end-to-end&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/9cRzg/3/" width="730" height="687" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>My summary would be that the end-to-end method can be the best <em>if it is scaled potentially ad infinitum and has very fast computational hardware</em>, which has practical (data requirements) and efficiency drawbacks. I think the hybrid architecture could be a good middle ground to greatly expand capabilities with less data and added safety and efficiency, but has some bottlenecks from interface choices that may impact some applications (but in a predictable way). I&#8217;m open to your thoughts &#8212; let me know below!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/building-a-reasoning-hierarchical/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/building-a-reasoning-hierarchical/comments"><span>Leave a comment</span></a></p><h2>Closing thoughts</h2><p>This demo was put together with models released within the last year, but also with ideas that have existed for decades. We&#8217;ve been seeing transformational improvement in the capabilities of deep neural networks, leading in many cases to large strategic shifts to embrace fully end-to-end architectures. However, this shift brings with it new problems in safety, efficiency, and predictability. This post goes over a proposal for a hybrid architecture that attempts to draw on the strengths of both camps.</p><p>There is room for improvement in end-to-end VLA approaches with scaling, as well as in this kind of hybrid architecture (faster VLM inference, multi-view VLM). <a href="https://itcanthink.substack.com/p/will-world-models-allow-robots-to">&#8220;World model&#8221; methods</a> are rapidly gaining popularity as a component of larger modular pipelines (stay tuned for future posts on this topic). I also plan to look more into how to build an &#8220;embodied reasoning&#8221; open-weight VLM in future posts.</p><p><em>Please check out the<strong> <a href="https://avikde.github.io/vla-pipeline/">demo</a></strong>, and the <strong><a href="https://github.com/avikde/vla-pipeline">source code</a></strong>.</em></p><p><em>If you liked this post, please <strong>like &#9825;</strong>, <strong>share</strong>, <strong>restack</strong>, and <strong>subscribe</strong> &#8212; it helps others find my writing.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h2>Further reading</h2><p>Other articles in this series:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;63dcb3ae-a9ad-4b7e-8d92-d3e18211ca4f&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The architecture behind &#8220;end-to-end&#8221; robotics pipelines&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-26T21:19:56.368Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!766O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/the-architecture-behind-end-to-end&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:185869291,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:23,&quot;comment_count&quot;:15,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;9b0c6a02-32e1-4b94-a6e1-e598a5cbfa76&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;\&quot;Is it learning?\&quot; Online motor adaptation in end-to-end robotics&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-03T17:51:24.836Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!x8Re!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/is-it-learning-online-motor-adaptation&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:186635241,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:5,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;ae055054-9c13-4b3a-9b55-871516d6b046&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Debugging as architecture insight: dissecting a VLA&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-26T15:46:18.127Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!zyjp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/debugging-as-architecture-insight&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:188827303,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:14,&quot;comment_count&quot;:2,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;8553c497-5203-479b-acb0-6b29e9923dd0&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;A coding agent equivalent for robotics pipelines&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-26T18:18:42.566Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!IGFD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee72e6-b019-4210-ab60-5d852f7b3f90_640x480.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:192049893,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:12,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I wonder if a VLM could be built with stereoscopic vision and some way to associate objects in the two images. Let me know in the comments if you know of anything like this!</p></div></div>]]></content:encoded></item><item><title><![CDATA[A coding agent equivalent for robotics pipelines]]></title><description><![CDATA[Part 4: Closing the action loop with a VLA vs. a spatial VLM "agent"]]></description><link>https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics</link><guid isPermaLink="false">https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Thu, 26 Mar 2026 18:18:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IGFD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee72e6-b019-4210-ab60-5d852f7b3f90_640x480.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This article is part of a series on end-to-end robotics pipelines</em></p><ol><li><p><a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">The architecture behind &#8220;end-to-end&#8221; robotics pipelines</a></p></li><li><p><a href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation?r=5vzx85">Online motor adaptation</a></p></li><li><p><a href="https://open.substack.com/pub/minpower/p/debugging-as-architecture-insight?utm_campaign=post-expanded-share&amp;utm_medium=web">VLA debugging insights</a></p></li><li><p>This article</p></li><li><p><a href="https://www.avikde.me/p/building-a-reasoning-hierarchical">Demo combining the best features of end-to-end and classical approaches</a></p></li></ol><div><hr></div><p>In this part, we finally close the loop to get our WidowX robot arm in the MuJoCo simulation to execute some manipulation tasks. I&#8217;ll go over how to build up (from scratch) something like the following behavior from a text prompt, and what we can learn about the architecture of robotics pipelines in the process.</p><p><em>Result of &#8220;Place the red block on the blue target&#8221;:</em></p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;b4a8f3ec-b69e-40e1-b2fe-fefb46fcd952&quot;,&quot;duration&quot;:null}"></div><p>An end-to-end Vision-Language-Action (VLA) model is the obvious modern technology <a href="https://open.substack.com/pub/itcanthink/p/vision-language-action-models-and?utm_campaign=post-expanded-share&amp;utm_medium=web">researchers and companies are moving toward</a> for this kind of functionality, and part 3 of this series was dedicated to understanding them from the inside. The deployment exercise for this part made clear that a small VLA&#8217;s failure modes are difficult&#8212;to the point of impossible&#8212;to eliminate without retraining.</p><p>That observation ultimately forced a pivot to a different architecture, where the flexible programming and semantic reasoning layer delegates physical grounding to explicitly separate tools. This post explains how that architecture works, and what it says about robotics pipelines more broadly.</p><p>In addition to the story, all the <a href="https://github.com/avikde/vla-pipeline">code is open-source</a> &#8212; feel free to learn from, star, and fork! Also, if you like this kind of post, please like, share, and subscribe:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>Moravec&#8217;s paradox and VLAs: control bandwidth problem</h3><p>In the previous post, we spent some time interpreting the perception and language understanding in VLAs &#8212; specifically, the <strong>V</strong>ision and <strong>L</strong>anguage parts. In many ways, the action head exhibits the most architectural diversity in VLAs.</p><p>There are broadly two types of action heads. (<strong>Auto)-regressive</strong> action heads generate actions sequentially, one at a time. This is similar to how most LLMs work today, generating tokens one after the other. <strong>Generative </strong>(or diffusion / flow-matching)<strong> </strong>action heads, in contrast, generate a whole action sequence at a time and incrementally refine it, similar to diffusion-based image generators.</p><p>Regressive action generators have a fundamental difficulty when used for behavior cloning in continuous action spaces. As Max Simchowitz presents in his recent CMU RI seminar<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, the issue is that a small deviation takes the red robot trajectory off the training (expert) demonstration distribution, and it is unable to recover.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EwG8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EwG8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 424w, https://substackcdn.com/image/fetch/$s_!EwG8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 848w, https://substackcdn.com/image/fetch/$s_!EwG8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 1272w, https://substackcdn.com/image/fetch/$s_!EwG8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EwG8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png" width="593" height="286.3179945054945" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:703,&quot;width&quot;:1456,&quot;resizeWidth&quot;:593,&quot;bytes&quot;:280583,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/192049893?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EwG8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 424w, https://substackcdn.com/image/fetch/$s_!EwG8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 848w, https://substackcdn.com/image/fetch/$s_!EwG8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 1272w, https://substackcdn.com/image/fetch/$s_!EwG8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Compounding error with regressive actions (source: Simchowitz RI seminar)</figcaption></figure></div><p>The same problem doesn&#8217;t occur in discrete spaces (like text generation) because they can be trained with a 0-1 or cross-entropy loss function, encouraging very aggressive contraction to the training distribution. Simchowitz identifies this challenge in continuous spaces with Moravec&#8217;s paradox (why learning hasn&#8217;t been as effective in physical tasks as in symbolic tasks like language).</p><p>Action chunking<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> presents a way to get around this problem. By producing an action sequence, over which the natural dynamics of the system is assumed to prevent compounding error, the rate of divergence is kept under control:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mx3T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mx3T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 424w, https://substackcdn.com/image/fetch/$s_!mx3T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 848w, https://substackcdn.com/image/fetch/$s_!mx3T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 1272w, https://substackcdn.com/image/fetch/$s_!mx3T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mx3T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png" width="595" height="238.24519230769232" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:583,&quot;width&quot;:1456,&quot;resizeWidth&quot;:595,&quot;bytes&quot;:290357,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/192049893?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mx3T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 424w, https://substackcdn.com/image/fetch/$s_!mx3T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 848w, https://substackcdn.com/image/fetch/$s_!mx3T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 1272w, https://substackcdn.com/image/fetch/$s_!mx3T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Managed error with action chunking (source: Simchowitz RI seminar)</figcaption></figure></div><p>A key assumption there was that the underlying system needs to have some strong stability properties. I won&#8217;t go into definitions here, but in practice, this means that VLA actions are almost exclusively desired positions (as opposed to velocity or torque). More generally, this means that the behavior is what is called &#8220;quasi-static&#8221;, i.e. the robot goes through a sequence of statically stable configurations. As an aside, this is also why VLA-implemented manipulation behaviors are slow and wouldn&#8217;t apply to dynamic behaviors like agile locomotion; quoting <a href="https://www.quantamagazine.org/why-do-humanoid-robots-still-struggle-with-the-small-stuff-20260313/">this Quanta magazine article</a>, &#8220;Atlas moves like molasses while grasping auto parts but glides like a gymnast when it&#8217;s not touching anything except the floor&#8221;.</p><p>So, action chunking is a way to address the <strong>control bandwidth problem</strong> from part 1 for regressive policies. Generative action heads don&#8217;t have the same inherent divergence issue, and it does seem like in practice most VLAs use that strategy &#8212; this also applies to our <a href="https://www.avikde.me/p/debugging-as-architecture-insight?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">demo setup with X-VLA</a> from part 3. They learn a distribution over actions, and at inference time, start with a pure &#8220;noise&#8221; action and iteratively denoise it. One thing to note is that the trajectory horizon <a href="https://generalrobots.substack.com/p/robotera-snatches-silver-in-sock/comment/227549490">does not impact how long the inference takes</a> (it is just the size of the action distribution learned during training). This means that shortening the action horizon size in order to get faster results isn&#8217;t an option like it typically is in model-predictive control.</p><p>Now that we understand VLA action heads a little better, let&#8217;s move on to closing the action loop.</p><h3>Closing the loop with X-VLA: generalization and separation problems</h3><p>The VLA outputs action chunks (a sequence of desired poses), and we now need to control the motors to reach them. The model for the WidowX arm in our simulation is set up for position control on the joints. This is in part due to how most people are using this arm (in some cases due to algorithmic constraints as mentioned above). For this article, I chose to keep that as is, and as a first pass, implement the most reasonable control method in this situation: inverse kinematics (IK). The <a href="https://github.com/avikde/vla-pipeline/blob/main/scripts/widowx_control.py">implementation</a> uses gradient descent to iteratively find the joint angles that reach a certain pose. This is a generalizable and quick method that will probably get replaced in the last part of the series by a non-IK solution.</p><p>After closing the control loop, prompting the VLA to &#8220;pick up the red block&#8221;, and running the simulation &#8212; well, it didn&#8217;t work. At this point, it was a little bit of the same challenge of &#8220;black box debugging&#8221; as in <a href="https://www.avikde.me/p/debugging-as-architecture-insight?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">part 3</a>, but now with more (literal) moving pieces. </p><p>It&#8217;s important to remember that X-VLA is a small VLA, and its generalization capabilities are limited by model size. As we saw in part 3, the model&#8217;s spatial reasoning (how far to reach, when to close) is tightly coupled to the training camera viewpoints. The camera intrinsic and extrinsic parameters are wrapped up in the full X-VLA policy and not separable<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>, and so I tried to modify the images received by the policy to try and match the training dataset.</p><p>I went into the <a href="https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot/tree/main/videos">BridgeData training dataset</a>, and found the most similar task in the training data, grabbed the training video, and tried to make my scene resemble it as closely as possible. To do this, I manually tuned the camera position, robot gripper initial pose and framing (camera extrinsics), image field of view (intrinsics), aspect ratio &#8220;squishing&#8221; to match training data, lighting / shadows, table appearance:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cf9E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cf9E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 424w, https://substackcdn.com/image/fetch/$s_!cf9E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 848w, https://substackcdn.com/image/fetch/$s_!cf9E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 1272w, https://substackcdn.com/image/fetch/$s_!cf9E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cf9E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png" width="522" height="281" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:281,&quot;width&quot;:522,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184644,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/192049893?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cf9E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 424w, https://substackcdn.com/image/fetch/$s_!cf9E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 848w, https://substackcdn.com/image/fetch/$s_!cf9E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 1272w, https://substackcdn.com/image/fetch/$s_!cf9E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Unfortunately, despite the manual tuning, and also completely decluttering the scene, the policy didn&#8217;t succeed with the prompt <em>&#8220;Pick up the red block&#8221;</em>:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;7eb1747d-d2b1-4ffa-9c4b-956d3d65c9c5&quot;,&quot;duration&quot;:null}"></div><p>It consistently overshot the block, which indicated to me that the visual processing had a consistent error, but fiddling with the camera settings didn&#8217;t yield a better result. The structural issue with VLAs (non-separability of camera and kinematics parameters) makes this debugging quite challenging, even beyond the techniques from part 3. If you know of anything that could have gotten this to work, let me know in the comments!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics/comments"><span>Leave a comment</span></a></p><p></p><p>I suspect that the generalization abilities of this size of VLA are just not sufficient to be able to use the policy zero-shot. There are two reasons why that is a roadblock: First (isolated to my usage here), I didn&#8217;t have a leader-follower arm or space mouse to collect more training data and go through a fine-tuning process. The second (and more fundamental) issue is that this limits how this kind of strategy can be used by robot end-users in ad-hoc unknown environments.</p><p>The flexible task programming and semantic task understanding of VLAs were some of the motivations for this project. Is there an alternative solution that can keep those strengths while adding some needed structure?</p><h3>An &#8220;agentic&#8221; modular alternative</h3><p>For scene and task understanding combined with flexible programming, we need some kind of VLM, but is there a way to get information out of the VLM in a more structured way?</p><p>In late 2025, Google released <a href="https://ai.google.dev/gemini-api/docs/robotics-overview">Gemini Robotics 1.5</a>, which consists of two models designed to have a hierarchical interface:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n4LZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n4LZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 424w, https://substackcdn.com/image/fetch/$s_!n4LZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 848w, https://substackcdn.com/image/fetch/$s_!n4LZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!n4LZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n4LZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png" width="466" height="401.15210355987057" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1064,&quot;width&quot;:1236,&quot;resizeWidth&quot;:466,&quot;bytes&quot;:267144,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/192049893?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F758b443d-1be0-4826-b046-0200c3f2b6fd_1236x1064.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n4LZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 424w, https://substackcdn.com/image/fetch/$s_!n4LZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 848w, https://substackcdn.com/image/fetch/$s_!n4LZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!n4LZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Out of the two, I only used the ER (Embodied Reasoning) model, which has been trained to output structured text combining the spatial understanding and function calling capabilities of the impressive Gemini model family. As <a href="https://ai.google.dev/gemini-api/docs/robotics-overview">documented here</a>, the &#8220;pointing&#8221; feature is effectively a customizable vision processing pipeline, and I found it to be incredibly robust:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RARj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RARj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 424w, https://substackcdn.com/image/fetch/$s_!RARj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 848w, https://substackcdn.com/image/fetch/$s_!RARj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 1272w, https://substackcdn.com/image/fetch/$s_!RARj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RARj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png" width="474" height="248.109375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:335,&quot;width&quot;:640,&quot;resizeWidth&quot;:474,&quot;bytes&quot;:180564,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/192049893?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee72e6-b019-4210-ab60-5d852f7b3f90_640x480.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RARj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 424w, https://substackcdn.com/image/fetch/$s_!RARj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 848w, https://substackcdn.com/image/fetch/$s_!RARj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 1272w, https://substackcdn.com/image/fetch/$s_!RARj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gemini Robotics ER 1.5 &#8220;pointing&#8221; capability, when just presented this image and asked to point out up to 10 objects in the scene.</figcaption></figure></div><p>The function calling capabilities can also be used to break down complex tasks into sub-steps, which is what I used for the working demo in the first section of this article. Here you can see that it is flexible to different prompts with no other changes:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;fe3f1923-93f1-4529-a960-00bf9106a41a&quot;,&quot;duration&quot;:null}"></div><p>Most shockingly, I spent only a couple of hours with the Gemini models to get to the successful end result above, after unsuccessful attempts over a significantly longer period with X-VLA.</p><p>So, why does this work so much more easily?</p><p>Just like Simchowitz did in his RI seminar, I think I&#8217;d have a pragmatic answer to do with scale, as well as an algorithmic answer independent of it. On model size, Gemini ER 1.5 is described as achieving &#8220;the low latency of a Gemini Flash model&#8221; for spatial tasks, which suggests it's Flash-scale (~8B range) but much larger than X-VLA (0.9B). On the algorithmic side, the difficulties we ran into with the VLA often had to do with <strong>inseparability of concerns </strong>(kinematics, calibration parameters not separable from the policy), and <strong>generalizability </strong>(difficult to tell when we were out of distribution).</p><p>I think an appropriate analogy here is between an LLM (even a coding-tuned one) to a coding <em>agent</em> like Claude Code (an LLM in a larger system that can interact with &#8220;tools&#8221;). A coding agent doesn&#8217;t ask the LLM to <a href="https://open.substack.com/pub/engrlog/p/why-skip-the-code-ship-the-binary?utm_campaign=post-expanded-share&amp;utm_medium=web">emit machine code directly</a>; it keeps the model in the semantic reasoning layer and delegates execution to existing well-understood tools. In this analogy, I&#8217;m suggesting that camera calibration, kinematics, motion controllers are tools that the VLM can benefit from interfacing with. Gemini ER just works on images; a well-defined, separable concern without introducing variability due to the robot morphology. Our known camera transformations then lift its image-space outputs into 3D. If we move the camera (impossible with X-VLA without retraining), we can simply replace the camera calibration parameters.</p><p>However, this structural separation appears to contradict the pure end-to-end view that goes back to the &#8220;bitter lesson.&#8221; Overall, in my opinion, the bitter lesson essay has been <a href="https://open.substack.com/pub/minpower/p/the-ai-world-models-debate-and-its?utm_campaign=post-expanded-share&amp;utm_medium=web">interpreted more broadly than current evidence supports</a>, and we will continue to see <a href="https://open.substack.com/pub/robonaissance/p/language-is-poison-part-2-the-bitter?utm_campaign=post-expanded-share&amp;utm_medium=web">reinterpretations</a> and corrections.</p><h3>Closing thoughts</h3><p>In this part of our series on robotics pipelines, we demonstrated a simple setup that exhibits flexible task programming. Despite our best efforts with an end-to-end VLA, this success came from coupling a strong VLM with model-based &#8220;tools&#8221; such as camera geometry and inverse / forward kinematics. This seems to me to reflect some of the strengths of agents that interacts with tools vs. an equivalent chatbot-style LLM. It certainly provided a clean way to integrate the strengths of a large learning-based model with structured model-based methods &#8212; something I&#8217;d set as a goal in part 1 of this series.</p><p>While this is a nice result, there are still a number of limitations: Gemini&#8217;s task planning is slow, even with cloud hardware. In the current implementation, the full plan is created at startup and there is no replanning for dynamic environments. The model is also not &#8220;open&#8221; and likely an order of magnitude larger than X-VLA. In the future, I may look into what it takes to develop an &#8220;embodied reasoning&#8221; model &#8212; it seems like the Gemini ER model appears to build on the ideas of the published <a href="https://arxiv.org/abs/2401.12168">SpatialVLM</a>.</p><p>In the last part of this series, I will plan to improve the lower-level controller from its naive IK implementation to show more responsive and <a href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">adaptive</a> behavior. I will also aim to publish it in a browser-runnable format so you can very easily and rapidly see the effects of different prompts. As a reminder, the code is all <a href="https://github.com/avikde/vla-pipeline">open-source</a>.</p><p><em>If you liked this post, please like (&#9825;), share, restack, and subscribe &#8212; it helps others find my writing.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><a href="https://youtu.be/UX1YXcRnFbs?si=wWY1LMwwtseW79Ku">Simchowitz RI seminar</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><a href="https://arxiv.org/abs/2304.13705">ACT paper</a>, whose author is a founder of Sunday Robotics, who in turn have an <a href="https://www.sunday.ai/journal/no-robot-data">ACT-1 foundation model</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>X-VLA has a soft-prompt architecture where embodiment specific parameters are technically separated, but not in an interpretable form.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Lessons from AVs on safety in end-to-end pipelines]]></title><description><![CDATA[Recent developments in autonomous vehicles on recognizing and handling distribution shift]]></description><link>https://www.avikde.me/p/lessons-from-avs-on-safety-in-end</link><guid isPermaLink="false">https://www.avikde.me/p/lessons-from-avs-on-safety-in-end</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Fri, 20 Mar 2026 18:48:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8Xju!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This short post covers a couple of recent updates from the autonomous vehicle (AV) industry with connections to broader and more general safety in robotics.</p><h3>Recognizing performance deterioration</h3><p>This <a href="https://www.theverge.com/transportation/897303/tesla-full-self-driving-nhtsa-probe-march-2026">Verge article from March 19</a> reports that there could be an impending recall of Tesla&#8217;s Full-Self Driving (FSD) service. I&#8217;m not interested in making any judgments about self-driving capability, but rather whether the root cause has anything we can learn from in broader robotics.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KoIT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KoIT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 424w, https://substackcdn.com/image/fetch/$s_!KoIT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 848w, https://substackcdn.com/image/fetch/$s_!KoIT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 1272w, https://substackcdn.com/image/fetch/$s_!KoIT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KoIT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png" width="537" height="214.62760834670948" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:498,&quot;width&quot;:1246,&quot;resizeWidth&quot;:537,&quot;bytes&quot;:124713,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/191604982?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KoIT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 424w, https://substackcdn.com/image/fetch/$s_!KoIT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 848w, https://substackcdn.com/image/fetch/$s_!KoIT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 1272w, https://substackcdn.com/image/fetch/$s_!KoIT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a><figcaption class="image-caption">Source: The Verge article linked above. Emphasis mine.</figcaption></figure></div><p>The issue appears to be that the system <strong>didn&#8217;t know when it wasn&#8217;t working well </strong>(causing the issues in the NHTSA filing), or that it did and didn&#8217;t notify the driver (which is unlikely, so we&#8217;ll assume the former).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8Xju!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8Xju!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8Xju!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8Xju!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8Xju!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8Xju!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg" width="583" height="293.10164835164835" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:732,&quot;width&quot;:1456,&quot;resizeWidth&quot;:583,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Tesla Full Self-Driving Beta 10.69 barrier&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tesla Full Self-Driving Beta 10.69 barrier" title="Tesla Full Self-Driving Beta 10.69 barrier" srcset="https://substackcdn.com/image/fetch/$s_!8Xju!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8Xju!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8Xju!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8Xju!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Tesla FSD (<a href="https://electrek.co/2026/03/19/nhtsa-upgrades-tesla-fsd-visibility-investigation-3-2-million-vehicles/">source</a>)</figcaption></figure></div><p>This phenomenon isn&#8217;t isolated to AVs. The latest article in my Vision-Language-Action (VLA) robotics pipeline series went hands-on into <a href="https://www.avikde.me/i/188827303/vla-debugging-ideas-and-techniques">debugging one</a>, and while we found some techniques that can aid developers, they didn&#8217;t directly help at inference time. Item 1 in <a href="https://ruixu.us/posts/six-things-robotics-startup">Rui Xu&#8217;s candid post-mortem</a> of K-Scale Labs mentions the pitfalls of trusting a &#8220;large model&#8221; vs. dedicated safety features. Recent papers on VLAs mention the fragility when moving away from the training distribution (e.g. <a href="https://arxiv.org/html/2506.09930v1">Fang et al Jun 2025</a>, <a href="https://arxiv.org/html/2512.16760v2">Hu et al Jan 2026</a>).</p><h3>Potential solutions: redundancy, confidence, architecture</h3><p>NVIDIA recently announced their new Alpamayo model and accompanying AV stack as a reference open model and toolchain. During the CES 2026 keynote, Jensen Huang said something intriguing about safety:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MHGN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MHGN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 424w, https://substackcdn.com/image/fetch/$s_!MHGN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 848w, https://substackcdn.com/image/fetch/$s_!MHGN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 1272w, https://substackcdn.com/image/fetch/$s_!MHGN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MHGN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png" width="570" height="302.22527472527474" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:772,&quot;width&quot;:1456,&quot;resizeWidth&quot;:570,&quot;bytes&quot;:203362,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/191604982?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MHGN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 424w, https://substackcdn.com/image/fetch/$s_!MHGN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 848w, https://substackcdn.com/image/fetch/$s_!MHGN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 1272w, https://substackcdn.com/image/fetch/$s_!MHGN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://junkoyoshidaparis.substack.com/p/nvidia-pulling-an-elon-might-have">Junko&#8217;s Tech Probe article</a></figcaption></figure></div><p>This parallel or hybrid architecture with a classical stack and a policy arbitrator were also covered in this <a href="https://counterpointresearch.com/en/insights/counterpoint-conversations-nvidia-at-ces-from-full-stack-autonomy-to-an-open-ecosystem-play">CounterPoint research article</a>. Interestingly, I can&#8217;t find references from NVIDIA themselves about this parallel system other than Jensen&#8217;s keynote &#8212; it&#8217;s possible it is just early in development.</p><p>A related approach is to have the VLA output some kind of confidence (vs. a separate &#8220;policy arbitrator&#8221;). <a href="https://arxiv.org/pdf/2507.17383">Zollo et al (Dec 2025)</a> formalizes the problem of confidence calibration for VLA policies, describes how to extract confidence estimates from contemporary VLA architectures, and notes that current VLAs lack a reliable mechanism for quantifying the uncertainty of their chosen action sequences. It also introduces two potential remedies: prompt ensembles and action-wise Platt scaling.</p><p>Lastly, inserting some debuggable interfaces into end-to-end pipelines can facilitate inspection and safety &#8212; lower-level controllers can apply dedicated safety constraints based on the information passed down from a higher-level controller. This <a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">appears to still be possible</a> in most successful humanoid robotics demonstrations of today due to a combination of factors. Keeping that architectural feature around may have long-standing benefits, based on current events in the AV industry!</p><p></p><p>Thanks for reading! I have been working on the next part of the <a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">end-to-end pipeline series</a>, with a deep dive into the action head and closed-loop behavior. If you liked this post, please share and subscribe.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Systolic arrays for general robotics, AI, and scientific computing]]></title><description><![CDATA[MatMuls dominate today's accelerators, but the original vision was much broader]]></description><link>https://www.avikde.me/p/systolic-arrays-for-general-robotics</link><guid isPermaLink="false">https://www.avikde.me/p/systolic-arrays-for-general-robotics</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Thu, 12 Mar 2026 15:09:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YIQz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The TPU (Tensor Processing Unit), introduced by Google in a whirlwind project ~2015, has now become synonymous with hardware acceleration for deep neural networks. I&#8217;ve listed some references below on further reading on the TPU (I&#8217;d especially recommend <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Babbage&quot;,&quot;id&quot;:102722254,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F82525b9c-ee3c-4996-916c-54267a4d354b_416x416.png&quot;,&quot;uuid&quot;:&quot;8da5c836-c587-4146-bce3-64a9c55735ee&quot;}" data-component-name="MentionToDOM"></span>&#8217;s <a href="https://thechipletter.substack.com/p/googles-first-tpu-architecture">historically-situated introduction</a>), but at the core of the TPU is a matrix multiplication unit (MXU) that achieves high-throughput and highly-efficient matrix multiplication. Since then, the concept has been integrated into a huge variety of hardware accelerators for neural networks (Groq LPU, NVIDIA Tensor Cores, Apple Neural Engine, Qualcomm Hexagon, and most NPUs), so you may think that it was Google&#8217;s ML inference ambitions that started this <a href="https://thechipletter.substack.com/p/ai-accelerators-the-cambrian-explosion">cambrian explosion</a> in matrix multiplication acceleration &#8212; but that would be almost 40 years off the mark.</p><p>All these matrix multiplication units are based on the systolic array, an architectural concept invented by HT Kung at Carnegie Mellon University in the late <em>1970&#8217;s</em>. And Kung&#8217;s group didn&#8217;t stop at matrix multiplication, they presented a concept of systolic <em>networks</em> of arbitrary processing <em>nodes</em> that could do way more. While some of those concepts appear in niche signal-processing ASICs today, the dominance of deep neural networks over the last decade has caused this history and potential to be significantly overlooked in my opinion.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bf6n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bf6n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Bf6n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Bf6n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Bf6n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bf6n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg" width="267" height="267" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:267,&quot;width&quot;:267,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18185,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bf6n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Bf6n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Bf6n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Bf6n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://seas.harvard.edu/person/ht-kung">HT Kung</a></figcaption></figure></div><p>My interest in this (and the goal of this article) is twofold: (1) Shine a spotlight on this fascinating research and preview the types of problems that can be solved with systolic architectures. (2) Dig into and potentially uncover jumps in performance and efficiency for AI and robotics. I believe that holistic full-stack understanding and optimization (bringing together algorithms and hardware) will be key in advancing  these technologies.</p><p>Beyond this post, we won&#8217;t stop at a theoretical overview &#8212; leveraging the computer engineering experience and story-telling of <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Chip Insights&quot;,&quot;id&quot;:2850528,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/chipinsights&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74222e4c-9d04-46aa-82ba-7d82759b48b9_512x512.png&quot;,&quot;uuid&quot;:&quot;59de6f17-98bb-4b82-8beb-9b1104da007d&quot;}" data-component-name="MentionToDOM"></span> we will actually build up accelerators to use in general-purpose robotics, AI, and scientific applications. We have an article coming soon with the first step, so make sure to subscribe!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h3>Why systolic architectures</h3><p>A systolic architecture is characterized by a network of processing elements (PE) that feed data to each other instead of going to the memory hierarchy for operands.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YIQz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YIQz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 424w, https://substackcdn.com/image/fetch/$s_!YIQz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 848w, https://substackcdn.com/image/fetch/$s_!YIQz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 1272w, https://substackcdn.com/image/fetch/$s_!YIQz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YIQz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png" width="1138" height="596" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:596,&quot;width&quot;:1138,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:81201,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/190643644?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c468baf-56ba-42f6-96ee-4b5d88455188_1138x668.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YIQz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 424w, https://substackcdn.com/image/fetch/$s_!YIQz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 848w, https://substackcdn.com/image/fetch/$s_!YIQz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 1272w, https://substackcdn.com/image/fetch/$s_!YIQz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Systolic array concept from Kung (1982)</figcaption></figure></div><p></p><p>The core benefits are:</p><ul><li><p>It alleviates <strong>memory bottlenecks</strong> by allowing multiple compute operations to occur without going to memory (as nicely depicted by the figure above). The design can allow computation time to be balanced with I/O if designed properly, avoiding one stalling due to the other.</p></li><li><p>It can create <strong>simple, regular designs</strong> &#8594; a modular setup that can be extended for different functions. It is relatively easy to write the RTL!</p></li><li><p>2D arrays can very easily be <strong>deeply pipelined</strong> (as we will see below), naturally taking advantage of algorithm concurrency.</p></li></ul><p>The PE network can look like a 1D array (pictured above), 2D array (the most common today), or even other connections for specialized computations. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2-0_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2-0_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 424w, https://substackcdn.com/image/fetch/$s_!2-0_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 848w, https://substackcdn.com/image/fetch/$s_!2-0_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 1272w, https://substackcdn.com/image/fetch/$s_!2-0_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2-0_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png" width="716" height="186" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:186,&quot;width&quot;:716,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2-0_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 424w, https://substackcdn.com/image/fetch/$s_!2-0_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 848w, https://substackcdn.com/image/fetch/$s_!2-0_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 1272w, https://substackcdn.com/image/fetch/$s_!2-0_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Another figure from Kung (1982) &#8212; connections depend on the number of inputs and outputs for each PE.</figcaption></figure></div><p>Data flows between cells in a pipelined fashion, and communication with the outside world is at boundary cells.</p><h3>The foundation of TPU &#8212; a MAC systolic network</h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R98b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R98b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 424w, https://substackcdn.com/image/fetch/$s_!R98b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 848w, https://substackcdn.com/image/fetch/$s_!R98b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 1272w, https://substackcdn.com/image/fetch/$s_!R98b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R98b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png" width="200" height="240" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:240,&quot;width&quot;:200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7317,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/190643644?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!R98b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 424w, https://substackcdn.com/image/fetch/$s_!R98b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 848w, https://substackcdn.com/image/fetch/$s_!R98b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 1272w, https://substackcdn.com/image/fetch/$s_!R98b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>A <strong>multiply-accumulate (MAC)</strong> PE has two input edges and two output edges. In the form drawn below (&#8220;weight-stationary&#8221;), the weight <em>w </em>is a parameter loaded into the PE. The data<em> x</em> flows in and is passed unchanged left to right, and the current &#8220;accumulation&#8221; <em>b</em> flows in from the top (usually from a PE connected to the north). The PE does the multiply-accumulate (<em>x * w + b</em>) and passes the accumulated sum down. We assume that the calculation happens in a single &#8220;tick&#8221; or clock cycle.</p><p>A PE in a systolic network is typically a simple compute primitive. Its power comes from connections to other PEs to express complex calculations.</p><p>The easiest way to understand how a weight-stationary systolic <em>array</em> works is to understand how a <strong>dot product</strong> is computed. This is shown in the following image for 3 cycles, and we will walk through the computation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MCl3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MCl3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 424w, https://substackcdn.com/image/fetch/$s_!MCl3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 848w, https://substackcdn.com/image/fetch/$s_!MCl3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 1272w, https://substackcdn.com/image/fetch/$s_!MCl3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MCl3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp" width="716" height="638" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:638,&quot;width&quot;:716,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13260,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/190643644?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MCl3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 424w, https://substackcdn.com/image/fetch/$s_!MCl3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 848w, https://substackcdn.com/image/fetch/$s_!MCl3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 1272w, https://substackcdn.com/image/fetch/$s_!MCl3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Read this image column-wise, starting from the left.</figcaption></figure></div><p>In each cycle, a new entry of <em>x</em> appears from the left, and one term is added to the dot product. The column of PEs contains a vector of weights. In each cycle, one term of the dot product is accumulated, and after 3 cycles, we have accumulated the full dot product <em><strong>b + w&#183;x</strong>.</em></p><p>We now draw the exact same operation, but in an abridged form (not showing the intermediate calculations and instead just showing the inputs and outputs at the ticks they appear).</p><ul><li><p>A column of the array is drawn as <em>vector</em> weight <em>w<sub>i</sub></em></p></li><li><p>The inputs are drawn as a diagonal (and enters the array skewed in time)</p></li><li><p>The output is shown at the bottom, appearing after <em>R</em> cycles from when the input hits row 1, where <em>R</em> is the number of PE&#8217;s in the column</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UyAL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UyAL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 424w, https://substackcdn.com/image/fetch/$s_!UyAL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 848w, https://substackcdn.com/image/fetch/$s_!UyAL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 1272w, https://substackcdn.com/image/fetch/$s_!UyAL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UyAL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp" width="942" height="494" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:494,&quot;width&quot;:942,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15508,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/190643644?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UyAL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 424w, https://substackcdn.com/image/fetch/$s_!UyAL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 848w, https://substackcdn.com/image/fetch/$s_!UyAL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 1272w, https://substackcdn.com/image/fetch/$s_!UyAL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In this form it is easy to see that we are <strong>pipelining</strong> different <em>x<sub>i</sub></em> by starting the second one the cycle after the first one. The initial accumulator value can be set to <em>b<sub>i</sub></em>, an affine bias.</p><p>So, with a single column systolic array, holding a column vector <em>w</em>, we are computing <em><strong>y = b + X&#183;w</strong></em>, where the rows of <em>X</em> are <em>x<sub>1</sub></em>, <em>x<sub>2</sub></em>, &#8230;</p><p>It is also noting the latency between when the <em>x<sub>i</sub></em> starts getting input to when we receive the output: The first element of <em>x<sub>1</sub></em> enters the array at time <em>t=1</em>, and we get the result out at <em>t=R</em>, so the latency is <em>R-1</em> cycles.</p><p>Making this a <strong>2D array</strong> (recall that the input x&#8217;s are bypassed to the right from each PE), we see that <em>x<sub>i</sub></em> will just arrive to interact with <em>w<sub>2</sub></em> one cycle later. We can appropriately skew the columns of the B matrix:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X1yw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X1yw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 424w, https://substackcdn.com/image/fetch/$s_!X1yw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 848w, https://substackcdn.com/image/fetch/$s_!X1yw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 1272w, https://substackcdn.com/image/fetch/$s_!X1yw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X1yw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp" width="960" height="760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:760,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18578,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/190643644?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!X1yw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 424w, https://substackcdn.com/image/fetch/$s_!X1yw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 848w, https://substackcdn.com/image/fetch/$s_!X1yw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 1272w, https://substackcdn.com/image/fetch/$s_!X1yw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The operation that is executes is <em><strong>Y = B + X&#183;W</strong></em>, where <em>b<sub>ij</sub> </em>above is in row <em>i</em> and column <em>j</em> of <em>B</em>, <em>W = [w1, w2]</em> is the fixed weight matrix loaded in first. If <em>W</em> is <em>n&#215;n</em>, and <em>X</em> is <em>m&#215;n</em>, the matrix product is <em>O(mn<sup>2</sup>)</em> operations (as is standard), but due to the <strong>structurally-enforced pipelining</strong>, it was completed in <em>O(m+n)</em> cycles!</p><p>And just like that, with a very simple MAC-computing PE, we can build up the matrix multiplication hardware unit that is the core of most AI hardware accelerators.</p><p>There is much more to be said about how it is implemented in RTL, how it performs, how the matrix shapes affect utilization, the total latency, throughput and efficiency benefits. We will go over that and intuitive insights in an upcoming <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Chip Insights&quot;,&quot;id&quot;:2850528,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/chipinsights&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74222e4c-9d04-46aa-82ba-7d82759b48b9_512x512.png&quot;,&quot;uuid&quot;:&quot;56f3a94e-1fd9-493e-983f-7beedc9b2d68&quot;}" data-component-name="MentionToDOM"></span> post. In the remainder of this article, we will turn our attention to other, more overlooked, uses of systolic networks in applications to broader AI, robotics, and numerical methods.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>Moving beyond MAC</h3><p>Keeping the exact same 2D array structure and the skewed input feeding, observe that we had two underlying binary operations: the PE (node) computed <strong>multiply (</strong><em><strong>&#183;</strong></em><strong>)</strong> and the arrow (edge) computed <strong>sum (+)</strong>. In general, the array will compute the result with those operators replaced by any counterparts: <strong>(</strong><em><strong>x<sub>1</sub></strong></em><strong>&#8857;</strong><em><strong>w<sub>1</sub></strong></em><strong>) &#8853; (</strong><em><strong>x<sub>2</sub></strong></em><strong>&#8857;</strong><em><strong>w<sub>2</sub></strong></em><strong>) &#8853; &#8943;</strong></p><p>I&#8217;ll be brief with these and list further reading below, and try to draw special attention to ones that are interesting for applications in robotics and AI.</p><h4>1) Pattern matching (Kung group)</h4><p>Using logical and (&#8743;) and logical or (&#8744;) as the operations: <em><strong>y</strong></em><strong> = (</strong><em><strong>x<sub>1</sub></strong></em><strong>&#8743;</strong><em><strong>w<sub>1</sub></strong></em><strong>) &#8744; (</strong><em><strong>x<sub>2</sub></strong></em><strong>&#8743;</strong><em><strong>w<sub>2</sub></strong></em><strong>) &#8744; &#8943;</strong></p><p>This will return <strong>1</strong> if the vector <em>x</em> matches the vector <em>w</em>, and <strong>0</strong> otherwise.</p><h4>2) Sorting (Kung group)</h4><p>This one is fascinating and intuitive. Each PE performs a simple compare and swap operation, and passes the max downward and the min rightward. With <em>n</em> rows and <em>n</em> columns, it will execute the <a href="https://en.wikipedia.org/wiki/Odd%E2%80%93even_sort">odd-even sort</a> algorithm and produce the sorted array.</p><p>A glance at that wikipedia page reveals both a weakness and a strength of systolic arrays. They can only execute algorithms that can work based on local connections (the odd-even sort takes <em>O(n<sup>2</sup>)</em> operations, vs. more optimal algorithms), but as in matrix multiplication above, the latency is <em>O(n)</em>. While the best sort algorithm takes <em>O(n log n)</em> steps sequentially in scalar hardware, the systolic network lets suboptimal algorithms complete with lower latency.</p><h4>3) 2D motion planning</h4><p>Deterministic motion planning (identifying environmental obstacles and planning a path through free areas respecting the system dynamics) is a fundamental problem in robotics. About 10 years ago there was even attempt to <a href="https://spectrum.ieee.org/motionplanning-chip-speeds-robots">build chips to solve this problem</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kdee!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kdee!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 424w, https://substackcdn.com/image/fetch/$s_!kdee!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 848w, https://substackcdn.com/image/fetch/$s_!kdee!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 1272w, https://substackcdn.com/image/fetch/$s_!kdee!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kdee!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png" width="444" height="407.64912280701753" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:684,&quot;resizeWidth&quot;:444,&quot;bytes&quot;:446396,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/190643644?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kdee!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 424w, https://substackcdn.com/image/fetch/$s_!kdee!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 848w, https://substackcdn.com/image/fetch/$s_!kdee!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 1272w, https://substackcdn.com/image/fetch/$s_!kdee!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Grid-based motion planning using dynamic programming (<a href="https://modernrobotics.northwestern.edu/nu-gm-book-resource/10-4-grid-methods-for-motion-planning/">source</a>)</figcaption></figure></div><p>Dynamic programming solutions (including Dijkstra&#8217;s algorithm, A*) can be implemented by local and iterative propagation from the goal, and just as with odd-even sort, the nearest-neighbor connection pattern can be mapped well to a systolic array.</p><p>Unfortunately, the number of grid cells grows exponentially with the dimension of the ambient space, and this is problematic if we need to have one PE per cell. This makes systolic motion planning impractical unless we only have a 2D problem to solve, but I think it is an interesting application nonetheless.</p><h4>4) Stereo vision semi-global matching</h4><p>A PE that accumulates matching costs along a scanline can be used to form a systolic array that implements <a href="https://en.wikipedia.org/wiki/Semi-global_matching">semi-global matching</a> (SGM). This algorithm is used to calculate disparity in the very popular <a href="https://github.com/realsenseai/librealsense/discussions/11586">Intel RealSense camera ASICs</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YcYE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YcYE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 424w, https://substackcdn.com/image/fetch/$s_!YcYE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 848w, https://substackcdn.com/image/fetch/$s_!YcYE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 1272w, https://substackcdn.com/image/fetch/$s_!YcYE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YcYE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png" width="250" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46126588-8883-46e7-8186-8ba28ce42e09_250x250.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:250,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YcYE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 424w, https://substackcdn.com/image/fetch/$s_!YcYE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 848w, https://substackcdn.com/image/fetch/$s_!YcYE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 1272w, https://substackcdn.com/image/fetch/$s_!YcYE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Grid-based SGM depiction (<a href="https://en.wikipedia.org/wiki/Semi-global_matching">source</a>)</figcaption></figure></div><p>SGM systolic arrays on FPGAs run this at pixel rate, processing one scanline per clock, and deterministic low-latency computation is obviously paramount here.</p><h4>5) Matrix decompositions for numerical methods</h4><p>To some extent, I&#8217;ve saved the most promising (at least in my view) for last. Matrix decompositions that aid in factorization are key to solving systems of equations, and this is ubiquitous in all sorts of robotics and general problems.</p><p><strong>5.1) QR decomposition. </strong>This matrix factorization is the numerically stable way to solve <strong>least squares or pseudoinverses</strong> in overdetermined systems, and has applications to robot kinematics, SLAM, sensor fusion, online parameter estimation, etc. Additionally, it is a key component of <strong>quadratic program (QP) solvers</strong>: in active set solvers after the active set is identified, for Jacobian factorization in SQP, etc. These workloads are important in <a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">low-level control in robotics</a> and typically need deterministic and low-latency solutions. The Givens rotations method (gentle explainer <a href="https://kwokanthony.medium.com/detailed-explanation-with-example-on-qr-decomposition-by-givens-rotation-6e7bf664fbdd">here</a>) performs local operations on 2x2 submatrices, which lends itself very well to locally-connected CORDIC-implementing PEs in a systolic array.</p><p><strong>5.2) Cholesky decomposition for symmetric positive-definite (SPD) matrices. </strong>This is a slightly easier factorization if the matrix is SPD, which comes up for example in state estimation, Kalman filtering, normal equations in interior point methods, etc. These workloads would come up in dedicated state estimation blocks in robotics pipelines. For decomposing <em>A = LL<sup>T</sup></em> with lower-triangular <em>L</em>, each PE computes one entry of L using only its left and upper neighbors, making the data dependencies purely local. This is repeated on the smaller matrix till completion.</p><p>Both of the systolic implementations referred to above use non-MAC PEs, and a triangular (not rectangular) network &#8212; this is very uncommon in current hardware, but was represented in the Kung references above.</p><p>For this post, I wanted to stick to high-level intuitive descriptions, but in the open-source <a href="https://github.com/avikde/tiny-xpu">TinyXPU project</a>, we will aim to implement and analyze some of these non-traditional systolic networks for robotics and AI pipelines. Stay tuned for the upcoming post introducing this project!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h3>Conclusion</h3><p>Systolic arrays were both invented before people think, and are more general than people think. They can have deterministic (no cache misses) <strong>high throughput and energy efficiency</strong> for algorithms which can work on local data. However, they are bad for working with sparse data (e.g. for sparse linear system solving), and bad for algorithms that need global data (e.g. Householder QR, which needs to operate on a full matrix column at a time).</p><p>In the deep neural network boom, the MAC array is so dominant in workload (&gt;95% of operations in any DNN) that the non-MAC compute takes a tiny fraction of time. Dedicating a full systolic array with <em>n<sup>2</sup></em> PEs to non-MAC operations would be area-inefficient for neural net workloads. This is why commercial vendors have not explored the co-design of systolic networks with algorithms, including PEs that can do MAC but also other functions like Givens rotations on one chip. For robotics workloads and other general scientific methods, the mix of primitives is different and (in my opinion) worth revisiting.</p><h3>Further reading</h3><ul><li><p><a href="https://www.eecs.harvard.edu/~htk/publication/1982-kung-why-systolic-architecture.pdf">Kung 1982: Why Systolic Architectures</a> - Great high-level overview of the motivation beyond systolic architectures</p></li><li><p><a href="https://swh.princeton.edu/~kung/papers_pdf/New%20Folder/VLSI%20Array%20Processors.pdf">Kung 1982: VLSA Array Processors</a> - Further detail on applications such as QR decomposition</p></li><li><p><a href="https://arxiv.org/pdf/1704.04760">Google TPU v1 paper</a></p></li></ul><p>Related posts:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;47f1c92a-14f6-478c-8016-691a6b344522&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The architecture behind &#8220;end-to-end&#8221; robotics pipelines&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-26T21:19:56.368Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!766O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/the-architecture-behind-end-to-end&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:185869291,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:18,&quot;comment_count&quot;:15,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:187337389,&quot;url&quot;:&quot;https://chipinsights.net/p/mapping-algorithms-to-custom-silicon&quot;,&quot;publication_id&quot;:2850528,&quot;publication_name&quot;:&quot;Chip Insights&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z-fT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74222e4c-9d04-46aa-82ba-7d82759b48b9_512x512.png&quot;,&quot;title&quot;:&quot;Mapping algorithms to custom silicon - Part 1&quot;,&quot;truncated_body_text&quot;:null,&quot;date&quot;:&quot;2026-02-09T00:15:44.482Z&quot;,&quot;like_count&quot;:22,&quot;comment_count&quot;:0,&quot;bylines&quot;:[{&quot;id&quot;:178190448,&quot;name&quot;:&quot;Bharath Suresh&quot;,&quot;handle&quot;:&quot;bharathw&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23b7c14a-5bd1-4a78-9ac8-c5d6eda62bfc_2048x2048.jpeg&quot;,&quot;bio&quot;:&quot;Engineer and Writer&quot;,&quot;profile_set_up_at&quot;:&quot;2024-08-04T01:39:48.025Z&quot;,&quot;reader_installed_at&quot;:&quot;2024-09-23T00:13:37.585Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:2896802,&quot;user_id&quot;:178190448,&quot;publication_id&quot;:2850528,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:true,&quot;publication&quot;:{&quot;id&quot;:2850528,&quot;name&quot;:&quot;Chip Insights&quot;,&quot;subdomain&quot;:&quot;chipinsights&quot;,&quot;custom_domain&quot;:&quot;chipinsights.net&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Semiconductor Industry Deep Dives&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74222e4c-9d04-46aa-82ba-7d82759b48b9_512x512.png&quot;,&quot;author_id&quot;:178190448,&quot;primary_user_id&quot;:178190448,&quot;theme_var_background_pop&quot;:&quot;#9A6600&quot;,&quot;created_at&quot;:&quot;2024-08-04T01:42:57.274Z&quot;,&quot;email_from_name&quot;:&quot;Chip Insights&quot;,&quot;copyright&quot;:&quot;Bharath Suresh&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;newspaper&quot;,&quot;is_personal_mode&quot;:false,&quot;logo_url_wide&quot;:null}},{&quot;id&quot;:3076811,&quot;user_id&quot;:178190448,&quot;publication_id&quot;:3023929,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:3023929,&quot;name&quot;:&quot;Bharath&#8217;s Musings&quot;,&quot;subdomain&quot;:&quot;bharathw&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;A place for my thoughts&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/afa88b37-7ced-4dd5-bdcb-580f7442001d_608x608.png&quot;,&quot;author_id&quot;:178190448,&quot;primary_user_id&quot;:null,&quot;theme_var_background_pop&quot;:&quot;#FF6719&quot;,&quot;created_at&quot;:&quot;2024-09-16T02:30:59.184Z&quot;,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Bharath Suresh&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;newspaper&quot;,&quot;is_personal_mode&quot;:false,&quot;logo_url_wide&quot;:null}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null,&quot;status&quot;:{&quot;bestsellerTier&quot;:null,&quot;subscriberTier&quot;:null,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:null,&quot;paidPublicationIds&quot;:[],&quot;subscriber&quot;:null}},{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;handle&quot;:&quot;avikde&quot;,&quot;previous_name&quot;:&quot;Avik&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;profile_set_up_at&quot;:&quot;2025-09-01T11:05:25.762Z&quot;,&quot;reader_installed_at&quot;:&quot;2025-12-14T02:43:43.888Z&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;status&quot;:{&quot;bestsellerTier&quot;:null,&quot;subscriberTier&quot;:1,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:{&quot;type&quot;:&quot;subscriber&quot;,&quot;tier&quot;:1,&quot;accent_colors&quot;:null},&quot;paidPublicationIds&quot;:[1063960],&quot;subscriber&quot;:null},&quot;primaryPublicationId&quot;:7287367,&quot;primaryPublicationName&quot;:&quot;min{power}&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://www.avikde.me&quot;,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://www.avikde.me/subscribe?&quot;}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;,&quot;source&quot;:null}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://chipinsights.net/p/mapping-algorithms-to-custom-silicon?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!Z-fT!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74222e4c-9d04-46aa-82ba-7d82759b48b9_512x512.png" loading="lazy"><span class="embedded-post-publication-name">Chip Insights</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Mapping algorithms to custom silicon - Part 1</div></div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">4 months ago &#183; 22 likes &#183; Bharath Suresh and Avik De</div></a></div>]]></content:encoded></item><item><title><![CDATA[Debugging as architecture insight: dissecting a VLA]]></title><description><![CDATA[Part 3: Hands-on debugging of a vision-language-action model as a lens into architecture, safety, and verifiability]]></description><link>https://www.avikde.me/p/debugging-as-architecture-insight</link><guid isPermaLink="false">https://www.avikde.me/p/debugging-as-architecture-insight</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Thu, 26 Feb 2026 15:46:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zyjp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This article is part of a series on end-to-end robotics pipelines. I&#8217;d recommend at least reading part 1 after this article.</em></p><ol><li><p><a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">The architecture behind &#8220;end-to-end&#8221; robotics pipelines</a></p></li><li><p><a href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation?r=5vzx85">Online motor adaptation</a></p></li><li><p>This article</p></li><li><p><a href="https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">Closing the action loop with a VLM &#8220;agent&#8221;</a></p></li><li><p><a href="https://www.avikde.me/p/building-a-reasoning-hierarchical">Demo combining the best features of end-to-end and classical approaches</a></p></li></ol><div><hr></div><p>In this part, we get hands-on and build a VLA pipeline from scratch. I&#8217;ll be transparent about my starting point: while I have experience with model-based methods, RL controllers, and LLMs/VLMs, generalist end-to-end policies &#8212; almost exclusively being realized today as Vision-Language-Action (VLA) models &#8212; were new territory. That makes this post a useful vantage point to evaluate their strengths and weaknesses from first principles, and should be interesting to those who have never heard of VLAs as well as those who use them daily.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>&#8220;Pick up the red block&#8221;</h3><p>The demo is simple: take a specified prompt (like the one in the heading above), run it through the model, and visualize the actions that the model outputs. Obviously, when it is run in closed loop, you would get motion that hopefully results in the action described by the prompt, but there was so much to dig into with just this visualization that it made sense to spend an article on it. In the next part, we will close the action loop and explore some of the low-level controller facets mentioned in part 1.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zyjp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zyjp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 424w, https://substackcdn.com/image/fetch/$s_!zyjp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 848w, https://substackcdn.com/image/fetch/$s_!zyjp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 1272w, https://substackcdn.com/image/fetch/$s_!zyjp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zyjp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png" width="490" height="348.4248424842484" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:790,&quot;width&quot;:1111,&quot;resizeWidth&quot;:490,&quot;bytes&quot;:245031,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6565620c-f18d-49d5-b581-9cd7f5732c26_1111x1001.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!zyjp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 424w, https://substackcdn.com/image/fetch/$s_!zyjp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 848w, https://substackcdn.com/image/fetch/$s_!zyjp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 1272w, https://substackcdn.com/image/fetch/$s_!zyjp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the following animation, the configuration of the arm is changed using the sliders (while being given the same prompt), showing that the output action is responsive to the robot and environment state.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;e1d0a884-a0a4-486f-b083-02fcc52043b7&quot;,&quot;duration&quot;:null}"></div><p>The learning journey for this article is captured in a Jupyter notebook that can be accessed and run for free on colab &#8212; <a href="https://colab.research.google.com/github/avikde/vla-pipeline/blob/main/xvla_widowx_vis_traj.ipynb">click here</a>. All details on the software stack are in the <a href="https://github.com/avikde/vla-pipeline">open-source github repository for this project</a> (which is where the notebook file also is). If it is a helpful learning tool or template, I&#8217;d welcome any feedback, fixes, contributions, stars, forks, etc.</p><p>First, let&#8217;s quickly go over what a VLA is.</p><h3>Anatomy of a Vision-Language-Action (VLA) model</h3><p>A Vision-Language-Action model has three functional components: a vision encoder, a language encoder, and an action head. In practice, the vision and language encoders are almost always a single pretrained VLM, i.e. the vision and language processing are already jointly trained before the action head is added. This means the &#8220;vision encoder&#8221; and &#8220;language encoder&#8221; aren&#8217;t independently tunable modules; they&#8217;re entangled by pretraining.</p><p>The architecturally interesting variation is in how the action head attaches to the VLM, and how much of the VLM is modified during robot training. This single design choice has large downstream consequences for what you can and cannot inspect at inference time.</p><h4>VLA &#8220;action head&#8221; architectures</h4><p>Two illuminating (but not exhaustive) designs:</p><p><strong><a href="https://octo-models.github.io/">Octo</a></strong> uses a dedicated readout token &#8212; a learned embedding (~384-dim) that aggregates action-relevant information from the transformer before a small decode network produces actions. This bottleneck is the closest thing to an inspectable interface in any current VLA: you can probe whether the readout encodes directional intent, object identity, or nothing interpretable.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;43b164dd-131b-4071-b3b0-869496610567&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Transformer &#8594; readout_action embedding (384-dim)
                        &#8595;
            Action Head (direct decode)
                        &#8595;
                    Actions</code></pre></div><p><strong><a href="https://thu-air-dream.github.io/X-VLA/">X-VLA</a></strong> processes images, language, proprioception, and noisy action candidates together in a single 24-layer transformer, conditioned by 32 learnable soft prompt tokens selected per embodiment. Flow matching then iteratively refines the action chunk over 10 steps. Action-relevant information is distributed across all layers and token types simultaneously.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;4d5e3b02-48c6-4434-8b2d-3edee2a6b173&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Input: Images + Language + Proprio + Domain ID
               &#8595;
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;  Soft Prompt Selection (per embodiment)   &#9474;
&#9474;  Domain 0 &#8594; Prompt_0 (32 learnable tokens)&#9474;
&#9474;  Domain 1 &#8594; Prompt_1 (32 learnable tokens)&#9474;
&#9474;  Domain N &#8594; Prompt_N (32 learnable tokens)&#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
               &#8595;
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;  Unified Transformer Stack (24 layers)   &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;  &#9474;
&#9474;  &#9474; [Soft Prompt | Vision | Lang |     &#9474;  &#9474;
&#9474;  &#9474;  Proprio | Noisy Actions]          &#9474;  &#9474;
&#9474;  &#9474;                                    &#9474;  &#9474;
&#9474;  &#9474;  All processed together with       &#9474;  &#9474;
&#9474;  &#9474;  standard self-attention           &#9474;  &#9474;
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;  &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
                &#8595;
       Flow Matching (10 steps)
                &#8595;
       Action Chunk (32 actions)</code></pre></div><p>The soft prompts enable efficient cross-embodiment adaptation: only ~9M parameters (1% of the model) need updating for a new robot. But they also mean embodiment-specific behavior is encoded in vectors with no interpretable structure.</p><p>The deeper point applies to both architectures: even where a vector interface exists between components (Octo&#8217;s readout token, X-VLA&#8217;s soft prompts), end-to-end training means those vectors don&#8217;t have a physical interpretation that safety constraints can be applied to.</p><p>There is more to be said on action chunking and control bandwidth, which I&#8217;ll plan to do in the next part of the series.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><p></p><h4>Model choice</h4><p>I approached this as a user rather than a researcher: published weights only, no training data collection, and no fine-tuning iterations before deployment. The target application is tabletop pick-and-place on a WidowX, which is a common manipulation benchmark and exposes the control and perception properties I care about. Another soft constraint was that I&#8217;d be able to run it on my personal laptop (12GB VRAM).</p><p>These three criteria limit which VLAs can be tried. <a href="https://huggingface.co/openvla/openvla-7b">OpenVLA-7B</a> requires task-specific fine-tuning and won&#8217;t fit in 12GB without quantization. <a href="https://huggingface.co/docs/lerobot/en/pi0">&#960;0</a> needs 24GB+. <a href="https://github.com/NVIDIA/Isaac-GR00T/blob/main/getting_started/hardware_recommendation.md">GR00T</a> requires a Jetson Thor. <a href="https://deepmind.google/blog/gemini-robotics-on-device-brings-ai-to-local-robotic-devices/">Gemini Robotics On-Device</a> is trained on dual-arm configurations and isn&#8217;t publicly accessible. Octo (93M params) fits the hardware but its pretraining doesn&#8217;t support zero-shot transfer without fine-tuning. <a href="https://huggingface.co/docs/lerobot/en/smolvla">SmolVLA</a> likewise requires fine-tuning.</p><p>X-VLA seems to fit the bill. Its soft-prompt architecture was designed for cross-embodiment zero-shot transfer, and <a href="https://huggingface.co/lerobot/xvla-widowx">xvla-widowx</a> provides a checkpoint fine-tuned on BridgeData for the WidowX embodiment specifically, meaning embodiment adaptation is handled, while task generalization remains zero-shot. It also has a <code>ee6d</code> (end-effector coordinates) action output mode, which appealed to me because it would allow elimination of kinematics-related variability.</p><h3>What&#8217;s different about VLAs: task programming</h3><p>VLAs have been heralded as revolutionary for robotics, and it&#8217;s true: the prospect of robot programming with natural language is a decided shift. Thinking about my own fielded robotics experience at Ghost Robotics, either customers would (a) directly command the robot, (b) pick between preprogrammed tasks (which can be considered a fixed small vocabulary of commands), or the robot would start its own tasks. Giving natural language commands increases the set of tasks <em>without retraining or reprogramming</em>. The natural language interface changes <em>who</em> can program a robot, not just <em>what</em> it can do. With a VLA, a non-technical operator can in principle specify novel tasks.</p><p>The flip side worth mentioning fairly: natural language as an interface trades a small precise vocabulary (preprogrammed tasks) for a large ambiguous one. &#8220;Pick up the red block&#8221; sounds more expressive than running the &#8220;pick_red&#8221; preprogrammed task, but as the next section will show, the boundary of what the model actually understands is opaque in a way that a fixed command vocabulary is not.</p><h3>What&#8217;s different about VLAs: calibration and debugging</h3><p>With classical methods, the process of setting up and debugging a task includes several well-delineated steps:</p><ul><li><p>calibrate cameras &#8594; check camera detection overlay &#8594; perception &#9989;</p></li><li><p>calibrate joints &#8594; send arm &#8220;move up&#8221; command and ensure it moves as expected &#8594; actuators &#9989;</p></li></ul><p>With VLAs, there are a few reasons why this kind of unit testing or debugging is simply not possible. </p><ol><li><p>Camera extrinsics or joint torque constant parameters will not be isolated: datasets are typically trained with multiple camera angles without explicit calibration, and network learns spatial transforms end-to-end.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> Another example: swapping the camera lens for a fisheye for a wider FOV won&#8217;t generalize without retraining, unlike traditional vision where you just recalibrate intrinsics.</p></li><li><p>There aren&#8217;t obvious equivalents of non-end-to-end interfaces such as the camera detection overlay or a &#8220;move up&#8221; command, but we will try to come up with methods to work around this in the next section.</p></li><li><p>Randomness: the trajectory will vary with no environmental change. Flow matching stochasticity is in the action head specifically; the VLM backbone is deterministic given the same input. X-VLA uses 10-step flow matching. Even with same seed, numerical precision in GPU ops causes drift by step 5-6</p></li></ol><p>Due to a combination of 1 and 2 (and slightly exacerbated by 3), it can be complex to reason about the root cause of a failure. Is failure due to (a) vision error, (b) action discretization error, (c) world model mismatch, or (d) all three? When can you dismiss a failure as being out of distribution vs. not?</p><p>For a developer, this ambiguity is an inconvenience; with enough time, you can run more experiments and form hypotheses (as we do in the next section). For a deployed system in customer hands, the same ambiguity becomes a safety property: the robot has no reliable mechanism to detect that it is out of distribution and should stop. Classical systems fail loudly (joint limit hit, object not detected, planner infeasible); VLAs fail silently, producing plausible-looking but wrong trajectories. This isn&#8217;t a criticism of VLAs specifically, but it is a structural consequence of end-to-end training, and it applies equally to any system where the failure boundary is defined implicitly by a training distribution rather than explicitly by an engineer.</p><h3>VLA debugging ideas and techniques</h3><p>Despite the structural challenges mentioned above, I had a fascinating experience coming up with ways to probe and understand what the VLA was doing. </p><h4>Passive debugging: inspect what the model is already computing</h4><ol><li><p><strong>Interpret VLM output (infeasible). </strong>My first instinct was to query the VLM backbone directly, e.g. by asking something like &#8220;Is there a red cube?&#8221; or &#8220;What objects are on the table?&#8221; to verify perception. This turns out not to be feasible for most architectures. In X-VLA and SmolVLA, the action head attaches to the VLM&#8217;s final hidden states and generates actions through flow matching in a continuous space, bypassing the text vocabulary entirely. You could query the underlying base VLM (e.g. <a href="https://huggingface.co/blog/smolvla#vision-language-model-vlm">SmolVLM2 for SmolVLA</a>) separately, but that&#8217;s not a fair proxy: fine-tuning on robot manipulation data shifts the VLM&#8217;s internal representations, so its text generation behavior no longer reflects what the VLA backbone actually sees. This technique only works cleanly in text-token VLAs like <a href="https://robot-learning-collective.github.io/vla-0-smol">VLA-0-Smol</a>, where actions are generated as autoregressive text strings from the same output head as language. There, scene description quality and action quality share a representation and if the model produces a poor scene description, it will likely produce poor action tokens.</p></li><li><p><strong>Visualize attention on tokens.</strong> The ubiquity of transformer-based architectures means that we can leverage the <a href="https://huggingface.co/docs/transformers/en/model_doc/encoder-decoder">HuggingFace transformer&#8217;s output_attentions</a> feature to try to visualize where the vision and text encoders are spending their attention, and if it is appropriate for the task specified. E.g. if we ask it to pick up a red block, is the vision encoder indeed looking at the red block?</p></li></ol><h4>Active debugging: intervene on inputs and observe behavioral change</h4><ol><li><p><strong>Camera ablations (test whether vision is doing object detection or spatial template matching).</strong> Move the camera position, and introduce occlusions into one of the views if there are multiple. If attentions fail to track the desired object, it suggests the model learned spatial heuristics tied to camera geometry rather than object identity. In a classical pipeline, object detection is camera-pose-invariant by design (you&#8217;d re-project into robot frame), but here, camera pose is baked into the learned policy implicitly through the training distribution.</p></li><li><p><strong><a href="https://www.emergentmind.com/topics/counterfactual-prompt-design">Counterfactual prompting</a> to test semantic understanding.</strong> Use variations of the prompt (e.g. red block vs. red cube) that effectively mean the same thing and observe if the output stays consistent. Different outputs exposes that the action head is sensitive to tokenization differences that the VLM alone would smooth over. Also, </p></li><li><p><strong>Primitive action prompts (tests action head&#8217;s semantic understanding of motion).</strong> E.g. if &#8220;don&#8217;t move&#8221; produces as much motion as &#8220;pick up block&#8221;, it shows that the action head is always generating motion from its training distribution, v.s. containing a deeper understanding of what motion is.</p></li></ol><p>I suspect that some (if not all) of these will be familiar to seasoned VLA users, but please let me know in the comments if you&#8217;re aware of a better technique &#8212; chances are that it will many prospective and current VLA users!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/debugging-as-architecture-insight/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.avikde.me/p/debugging-as-architecture-insight/comments"><span>Leave a comment</span></a></p><h3>Debugging results</h3><p>For each experiment, I&#8217;ll write what a reasonable expectation would be, the result we see, and the resulting insight or the deeper reason why.</p><h4>Baseline: pick up the red block</h4><p>In this baseline, the attention mask on the image looks like it is looking at the red block as well as the gripper. The reaching trajectory output looks like it moves to directly over the red block. Overall, this looks to be a great initial result.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NfNw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NfNw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!NfNw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!NfNw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!NfNw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NfNw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/adcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:260352,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!NfNw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!NfNw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!NfNw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!NfNw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Experiment 1: Picking a different block in view</h4><p><strong>Expectation: </strong>Symmetric action based on spatial understanding from multiple views</p><p><strong>Result: </strong>The visualized attention shows that it is looking at approximately the correct part of the primary image, though it appears a little offset to the outside of the block. The reaching action appears to not reach as far toward the blue block.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tsjO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tsjO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!tsjO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!tsjO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!tsjO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tsjO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:263368,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!tsjO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!tsjO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!tsjO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!tsjO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>Most likely, the &#8220;3D&#8221; spatial understanding from the images is not exactly what we would expect from an exactly calibrated perception and object identification setup.</p><h4>Experiment 2: Swap blue / red positions</h4><p><strong>Expectation: </strong>Symmetric behavior from previous experiment.</p><p><strong>Result: </strong>Blue trajectory overshoots more compared to initial red block trajectory, and red trajectory overshoots.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sCNd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sCNd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!sCNd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!sCNd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!sCNd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sCNd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:264576,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!sCNd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!sCNd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!sCNd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!sCNd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UUKw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UUKw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!UUKw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!UUKw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!UUKw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UUKw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:264223,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!UUKw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!UUKw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!UUKw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!UUKw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>Spatial understanding and behavior is not symmetric when it is expected to be, indicating a bigger effect of things like training data distribution.</p><h4>Experiment 3: Altered primary camera view</h4><p><strong>Expectation: </strong>Same behavior as the initial camera view.</p><p><strong>Result: </strong>The red block trajectory now exhibits the under-reaching from the blue trajectory before, and vice versa.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e3lu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e3lu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!e3lu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!e3lu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!e3lu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e3lu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:249229,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!e3lu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!e3lu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!e3lu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!e3lu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dKBg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dKBg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!dKBg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!dKBg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!dKBg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dKBg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:249035,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dKBg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!dKBg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!dKBg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!dKBg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>The actions are inseparably tied to the camera view and not associated with absolute spatial understanding.</p><h4>Experiment 4: Remove second camera view</h4><p><strong>Expectation: </strong>Slight degradation in performance.</p><p><strong>Result: </strong>Removing the side view has minimal effect, but removing the over-the-shoulder view has a disastrous effect on performance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iFFG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iFFG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!iFFG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!iFFG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!iFFG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iFFG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d6d97802-2a8f-4f16-b595-86523141105c_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262220,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!iFFG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!iFFG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!iFFG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!iFFG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XlS0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XlS0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!XlS0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!XlS0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!XlS0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XlS0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262235,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!XlS0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!XlS0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!XlS0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!XlS0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>It appears that <a href="https://rail-berkeley.github.io/bridgedata/">BridgeData</a> has a disproportionately high number of trials with the over-the-shoulder view and significantly altered view points may silently produce much worse results. </p><h4>Experiment 5: Occluded primary view</h4><p><strong>Expectation: </strong>Second view provides redundancy.</p><p><strong>Result: </strong>Trajectory moves away from the red block.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iEYg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iEYg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!iEYg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!iEYg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!iEYg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iEYg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262599,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!iEYg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!iEYg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!iEYg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!iEYg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>The side camera view seems to not be useful in X-VLA.</p><h4>Experiment 6: Prompt variations</h4><p><strong>Expectation: </strong>Similar-meaning prompts will produce similar actions.</p><p><strong>Result: </strong>All these similar prompts largely resulted in similar actions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b8o3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b8o3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!b8o3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!b8o3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!b8o3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b8o3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/baf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262707,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!b8o3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!b8o3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!b8o3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!b8o3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8733!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8733!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!8733!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!8733!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!8733!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8733!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262482,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8733!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!8733!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!8733!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!8733!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UxGP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UxGP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!UxGP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!UxGP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!UxGP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UxGP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:263062,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UxGP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!UxGP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!UxGP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!UxGP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IeOo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IeOo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!IeOo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!IeOo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!IeOo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IeOo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262328,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IeOo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!IeOo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!IeOo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!IeOo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>The language encoder is effective at collapsing equivalent prompts to the same actions.</p><h4>Experiment 7: Don&#8217;t move</h4><p><strong>Expectation: </strong>No motion.</p><p><strong>Result:</strong> Approximately as much motion as when asked to pick the red cube with the left shoulder view.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qWia!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qWia!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!qWia!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!qWia!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!qWia!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qWia!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:261100,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qWia!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!qWia!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!qWia!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!qWia!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>the model is still interpolating / extrapolating from training samples and does not have an explicit understanding of commands.</p><h4>Experiment 8: Change picking position</h4><p><strong>Expectation: </strong>The output trajectory moves to the modified block position.</p><p><strong>Result: </strong>The visual attention is strangely not on the block in the second example, but largely, the trajectory is responsive to the environment change.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Dn7O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Dn7O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!Dn7O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!Dn7O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!Dn7O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Dn7O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:258635,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Dn7O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!Dn7O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!Dn7O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!Dn7O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PAWb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PAWb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!PAWb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!PAWb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!PAWb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PAWb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:261359,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!PAWb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!PAWb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!PAWb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!PAWb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>As long as the prompt is visually grounded, the results generalize in the expected way. Soft prompt for WidowX likely encodes &#8220;approach visible object&#8221; as primitive (trained on Bridge dataset).</p><h4>Experiment 9: Move forward / backward / up / down</h4><p><strong>Expectation:</strong> Move as asked.</p><p><strong>Result: </strong>Approximately the same motion toward the tabletop, largely uncorrelated with the prompt.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N4k4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N4k4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!N4k4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!N4k4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!N4k4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N4k4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:261575,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N4k4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!N4k4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!N4k4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!N4k4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EiXO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EiXO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!EiXO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!EiXO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!EiXO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EiXO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:265645,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EiXO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!EiXO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!EiXO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!EiXO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dNob!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dNob!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!dNob!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!dNob!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!dNob!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dNob!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262285,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dNob!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!dNob!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!dNob!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!dNob!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>No visual grounding for blind motions. The model has no spatial primitive vocabulary because VLMs are trained on image-caption pairs where &#8220;up&#8221; describes scene composition, not robot workspace direction.</p><h4>Experiment 10: Move toward / away from base</h4><p><strong>Expectation: </strong>Move as instructed.</p><p><strong>Result: </strong>Discernible difference in the two trials accordingly, suggesting some comprehension of the prompt.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gCGK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gCGK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!gCGK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!gCGK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!gCGK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gCGK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:261494,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gCGK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!gCGK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!gCGK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!gCGK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B8R1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B8R1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!B8R1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!B8R1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!B8R1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B8R1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:266299,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B8R1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!B8R1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!B8R1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!B8R1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>The introduction of the robot base as a (visible) target makes things significantly easier for the model compared to the previous experiment.</p><h4>Experiment 11: Move away from block</h4><p><strong>Expectation: </strong>Motion away from the block.</p><p><strong>Result: </strong>Motion largely toward the tabletop.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5mc5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5mc5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!5mc5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!5mc5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!5mc5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5mc5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:266131,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5mc5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!5mc5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!5mc5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!5mc5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>The word &#8220;away&#8221; is probably not having the spatial effect that it should in this context, exposing the ambiguity inherent in using language for robot programming. Whether we like it or not, at least unless the language model is huge, it is safer to assume that the prompt effectively indexes or extrapolates among training data, and that positional prepositions (commonly used by humans to communicate spatial commands) are not reliable to use.</p><h3>What the experiments reveal about current VLAs</h3><p><strong>1. Camera view is tied to the behavior, not a calibrated parameter.<br></strong>Experiments 2, 3, 4, and 5 collectively show that the model&#8217;s spatial behavior is tied to the training distribution&#8217;s camera geometry rather than to a camera-pose-invariant object representation. Swapping shoulders changes reach distance; replacing the over-shoulder view with a side view breaks the policy entirely even though the scene is identical. This is a consequence of any VLA trained end-to-end without explicit camera calibration. The practical implication is that deployment requires camera placement matching the training distribution, and the model will fail silently when out of distribution.</p><p><strong>2. The action manifold is object-centric, not spatially general.<br></strong>Experiments 7, 9, 10, and 11 collectively show that the model has no spatial primitive vocabulary independent of objects. &#8220;Move up/forward/back&#8221; all produce similar grasping-like motions; &#8220;don&#8217;t move&#8221; produces motion; &#8220;move away from block&#8221; produces motion toward the block. &#8220;Move toward/away from base&#8221; works only because the base is a visually grounded object in the scene. This generalizes beyond X-VLA: any VLA at this scale trained predominantly on pick-and-place demonstrations will have an action manifold that approximates &#8220;move toward salient object and grasp.&#8221; Spatial relation commands only work when they can be reduced to object identity. This has a direct safety implication: you cannot issue a recovery command (&#8221;stop,&#8221; &#8220;move away,&#8221; &#8220;back off&#8221;) and expect it to override the trained behavioral prior.</p><p><strong>3. VLAs at this scale appear to lack compositional generalization.</strong><br>Experiments 7, 9, and 11 show that novel combinations of spatial primitives and objects (even using vocabulary the model demonstrably knows) produce behavior dominated by the training distribution rather than the instruction. This is distinct from the question of whether larger VLAs generalize better, which is likely true, but out of scope for this article. But it does suggest that for sub-1B parameter VLAs, natural language commands are most reliable when they closely match the task distribution the model was trained on, which significantly narrows the practical definition of "zero-shot generalization" for deployment.</p><h3>Closing thoughts</h3><p>For flow-matching VLAs like X-VLA, the classical debugging question &#8220;is this a vision problem or a control problem?&#8221; is not just difficult to answer but structurally unanswerable. End-to-end training eliminates the interfaces that would make the question meaningful.</p><p>The debugging ideas presented here offer partial remedies: passive inspection via attention visualization and active intervention via camera ablations and language variation. These experiments also surfaced three concrete findings: spatial understanding is tied to training-distribution camera geometry rather than calibrated object pose; the action manifold is object-centric and lacks spatial primitive vocabulary; and compositional generalization breaks down for novel combinations of known concepts. These are echoes of the <a href="https://open.substack.com/pub/aisnakeoil/p/new-paper-towards-a-science-of-ai?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">reliability concerns of consistency, robustness, predictability, and safety</a> that are crucially important to evaluate robotics progress.</p><p>None of this diminishes what VLAs actually deliver &#8212; flexible task programming and meaningful robustness to environmental variation, without any robot-specific programming. The path to reliable deployment is augmenting the strengths of VLAs with explicit interfaces for safety constraints, reducing complexity by utilizing known tools for camera and kinematics calibration, and out-of-distribution detection.</p><p>In the next part, we will close the loop with this demo&#8217;s action outputs to try and leverage the strengths of VLAs in conjunction with low-level control ideas from parts 1 and 2.</p><p>If you liked this kind of analysis, please subscribe for future posts, and thanks for reading!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>In fact, as mentioned above, the trajectory start point being slightly variable makes me suspect some error but it&#8217;s quite difficult to narrow down further, even after checking <a href="https://github.com/2toinf/X-VLA?tab=readme-ov-file#5%EF%B8%8F%E2%83%A3-standardized-control-interface-ee6d">the documentation</a> and opening <a href="https://huggingface.co/lerobot/xvla-widowx/discussions/2">an issue</a>. However, this isn&#8217;t a fundamental VLA issue and I&#8217;m going to put it aside for this article.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[What Wiener knew about (artificial) intelligence in 1948]]></title><description><![CDATA[Cybernetics anticipated feedback, structure, and the human stakes of machine intelligence with unsettling precision]]></description><link>https://www.avikde.me/p/what-wiener-knew-about-artificial</link><guid isPermaLink="false">https://www.avikde.me/p/what-wiener-knew-about-artificial</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Sat, 21 Feb 2026 16:00:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!cQgM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As evidenced by my <a href="https://www.avikde.me/p/what-von-neumann-understood-about">prior post on von Neumann</a>, I believe it&#8217;s crucial to integrate historical context and cross-disciplinary knowledge at this pivotal period of technological change. It was recommended to me that I read Norbert Wiener&#8217;s <em>Cybernetics</em>, published even earlier and another pillar in the founding moment of the information age.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cQgM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cQgM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 424w, https://substackcdn.com/image/fetch/$s_!cQgM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 848w, https://substackcdn.com/image/fetch/$s_!cQgM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 1272w, https://substackcdn.com/image/fetch/$s_!cQgM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cQgM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp" width="600" height="450" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Norbert Wiener, matem&#225;tico fundador de la cibern&#233;tica.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Norbert Wiener, matem&#225;tico fundador de la cibern&#233;tica." title="Norbert Wiener, matem&#225;tico fundador de la cibern&#233;tica." srcset="https://substackcdn.com/image/fetch/$s_!cQgM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 424w, https://substackcdn.com/image/fetch/$s_!cQgM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 848w, https://substackcdn.com/image/fetch/$s_!cQgM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 1272w, https://substackcdn.com/image/fetch/$s_!cQgM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Norbert Wiener, the founder of cybernetics (<a href="https://loff.it/society/efemerides/norbert-wiener-matematico-fundador-de-la-cibernetica-216189/">Image source</a>)</figcaption></figure></div><p>Wiener was a prodigious child, receiving a PhD by age 18 from Harvard, and becoming MIT mathematics faculty. By the account of <em>Dark Hero of the Information Age</em>, the biography by Flo Conway and Jim Siegelman, he was simultaneously one of the most intellectually alive and emotionally turbulent figures in twentieth-century science: touched by manic-depressive episodes and collegial feuds, yet capable of a mathematical breadth that few of his contemporaries could match.</p><p>That breadth is visible in the book he published in 1948: <em>Cybernetics, or Control and Communication in the Animal and the Machine</em>. Its thesis was that information flow and message-passing are central to control and communication in both animals and machines. It appeared the same year as Shannon&#8217;s &#8220;A Mathematical Theory of Communication&#8221; and the year before Shockley&#8217;s transistor paper. Wiener was at the center of the founding of the information age, and yet he has been largely forgotten in the recent technological development. His legacy was overshadowed by Shannon, who had the more implementable theory, and by von Neumann, who had the more implementable architecture.</p><p>Reading <em>Cybernetics</em> now, almost 80 years later, is awe-inspiring and unsettling in equal measure. It is mathematically dense in places and dated in others, but the program it laid out is strikingly relevant to modern AI development. Here are the ideas from it that I found most resonant.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>1. Feedback</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YoNn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YoNn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 424w, https://substackcdn.com/image/fetch/$s_!YoNn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 848w, https://substackcdn.com/image/fetch/$s_!YoNn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 1272w, https://substackcdn.com/image/fetch/$s_!YoNn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YoNn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png" width="404" height="336.003937007874" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:845,&quot;width&quot;:1016,&quot;resizeWidth&quot;:404,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;undefined&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="undefined" title="undefined" srcset="https://substackcdn.com/image/fetch/$s_!YoNn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 424w, https://substackcdn.com/image/fetch/$s_!YoNn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 848w, https://substackcdn.com/image/fetch/$s_!YoNn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 1272w, https://substackcdn.com/image/fetch/$s_!YoNn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Watt&#8217;s flyball governor (<a href="https://en.wikipedia.org/wiki/Centrifugal_governor">image source</a>)</figcaption></figure></div><p>&#8220;Cybernetics&#8221; originates from the Ancient Greek <strong>&#954;&#965;&#946;&#949;&#961;&#957;&#942;&#964;&#951;&#962; (kybern&#275;t&#275;s)</strong>, meaning &#8220;steersman&#8221; &#8212; the same root that, via Latin, gave us the word &#8220;governor.&#8221; It is perhaps not coincidence that Maxwell&#8217;s <a href="https://www.jstor.org/stable/112510">paper on governors</a> was the first known exposition on feedback control. I don&#8217;t need to elaborate on the value of feedback in modern technology, but two nontrivial leaps Wiener makes are worth highlighting.</p><p>First, he draws a connection between communication and control in neurology. The feedback loop (sense the error, apply a correction, repeat) describes voluntary movement in biological systems. When this feedback is damaged, as in cerebellar injury, the result is a tremor or oscillation: too aggressive a correction followed by too aggressive a counter-correction. This convergence of engineering control theory and neurology was a founding observation of cybernetics: the same mathematics governs servomechanisms and nervous systems.</p><p>The second leap is the identification of a fundamental tradeoff: <strong>do you invest in modeling or in feedback?</strong> Wiener&#8217;s answer depends on how constant and knowable your system is. He called systems that leverage explicit models <em>compensators</em>, contrasting them with pure feedback mechanisms.  In today&#8217;s terms, Wiener&#8217;s compensator needs a world model: an internal representation of how the system behaves that allows action without waiting for error to accumulate. The model vs. feedback tradeoff he identified has strong echoes of the one playing out now in the debate between <a href="https://www.avikde.me/p/the-ai-world-models-debate-and-its">scaling-based and structured AI architectures</a>, not to mention <a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">in robotics</a>. Model-free reinforcement learning is a direct descendant of the feedback side of this tradeoff: an agent interacts with an environment, receives a reward signal reflecting the gap between its behavior and a desired outcome, and adjusts its policy accordingly.</p><h3>2. Neuron structure: digital vs. analog</h3><p>Wiener asks in the book: in what ways are the computational substrates of brains and machines alike, and in what ways are they fundamentally different?</p><p>Wiener&#8217;s first observation is that neurons obey an &#8220;all-or-none&#8221; law (they fire fully or not at all) and in this sense are digital. This is in tension with von Neumann&#8217;s later analysis, covered in a <a href="https://www.avikde.me/p/what-von-neumann-understood-about">prior post</a>: von Neumann argued that individual neurons function more like small analog computers, with temporal dynamics and nonlinear integration beyond what a simple threshold element can do. The understanding of neuronal computation has deepened considerably since both accounts, and the honest answer is that neurons are neither purely one nor the other.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3zC7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3zC7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 424w, https://substackcdn.com/image/fetch/$s_!3zC7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 848w, https://substackcdn.com/image/fetch/$s_!3zC7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 1272w, https://substackcdn.com/image/fetch/$s_!3zC7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3zC7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png" width="550" height="373.4113712374582" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:609,&quot;width&quot;:897,&quot;resizeWidth&quot;:550,&quot;bytes&quot;:111767,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188648218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3zC7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 424w, https://substackcdn.com/image/fetch/$s_!3zC7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 848w, https://substackcdn.com/image/fetch/$s_!3zC7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 1272w, https://substackcdn.com/image/fetch/$s_!3zC7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">McCulloch-Pitts neuron models (1943) referenced by Wiener</figcaption></figure></div><p>What followed from the digital view, at least in the engineering tradition, was eventually deep learning: stack enough simple threshold units to sufficient depth, and powerful computation emerges. But as the next sections argue, Wiener himself was skeptical that the generic stacking of simple units was sufficient.</p><h3>3. Neuron organization: flexible vs. dedicated</h3><p>What Wiener does not dispute is that even if neurons are digital in their firing, their <em>organization</em> is anything but generic. He writes: </p><blockquote><p>The structure of our visual cortex is too highly organized, too specific, to lead us to suppose that it operates by what is after all a highly generalized mechanism.</p></blockquote><p>As a mathematician, he frames this in terms of group theory: the visual system is built to be invariant under transformations of position, rotation, scale, and illumination. Image recognition is comparison at the level of structural properties that persist across transformations, and not comparison of photoreceptor signals. The retina has broadly distributed and low-resolution rod cells and foveally-concentrated cones, and layers beyond it extracting features at multiple spatial frequencies in parallel. Structure encoded by biology is doing work from the very first stage.</p><p>The unifying point is that the brain does not apply a general-purpose function to raw sensory data and let structure emerge. It applies a pipeline in which each stage is specifically organized to extract the right kind of information. Most modern vision models posit that this structure will emerge from scale and data; capsule networks, group-equivariant CNNs etc. attempt to encode it explicitly but remain outside the mainstream. This is the same tension at the heart of the world models debate: whether sufficient scale applied to a general architecture will recover the structure that biology built in deliberately, or whether that structure ought to be encoded.</p><h3>4. The switchboard analogy</h3><p>Wiener is next interested in how neurons are organized, and here his analysis diverges sharply from the digital computer model he was comparing against.</p><p>A digital computer of his era had specific circuits for specific operations: an adder, a multiplier, a comparator, each doing one thing reliably and repeatedly. The brain, he argues, does not work this way. Rather than dedicated permanent circuits, the brain reconfigures its functional connections dynamically, routing signals through different pathways depending on context. He uses the telephone switchboard as his analogy: the same physical wires serve different conversations depending on how the exchange is configured at any moment.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!w04J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!w04J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 424w, https://substackcdn.com/image/fetch/$s_!w04J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 848w, https://substackcdn.com/image/fetch/$s_!w04J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 1272w, https://substackcdn.com/image/fetch/$s_!w04J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!w04J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png" width="805" height="201" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:201,&quot;width&quot;:805,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:79768,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188648218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc01eaa6-0907-4e03-8d09-8cd53e6404b0_898x246.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!w04J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 424w, https://substackcdn.com/image/fetch/$s_!w04J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 848w, https://substackcdn.com/image/fetch/$s_!w04J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 1272w, https://substackcdn.com/image/fetch/$s_!w04J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">An image is formed on the photocells (bottom), but then flexibly connected to processing at different size scales (dotted lines). Original image source: <em>Cybernetics.</em></figcaption></figure></div><p>He makes this concrete with a visual processing example depicted above: recognizing a letter regardless of its size (a large &#8220;A&#8221; and a small &#8220;A&#8221;) with a fixed array of photocells. His proposed solution uses a switchable connection layer between the photocell array (bottom) and a fixed set of processing elements (top). By selecting different connection patterns (the diagonal lines), photocell activations at different scales get mapped onto the same processing elements, achieving scale invariance through reconfigurable routing rather than through a learned function.  In deep learning perception, this is similar to ideas like spatial pyramid pooling or adaptive pooling.</p><p>In contrast, a vision transformer applies the same operation at every layer to every token, with flexibility coming entirely from learned weights at massive scale. There is no dynamic routing or reconfiguration based on the nature of the input. Wiener pointed out that this approach carries a cost: a large fixed architecture must run in its entirety even when most of it is irrelevant to the current input. A 175B parameter model processing a simple query still activates the full machinery, paying the energy and latency cost of elements that contribute nothing to that particular computation.</p><p>Some modern work moves toward Wiener&#8217;s direction. Mixture-of-experts architectures route inputs to specialized sub-networks rather than running everything; sparse transformers use dynamic attention patterns; early-exit networks use only as much compute as the input requires. These remain the exception rather than the rule, but they are each, in a real sense, implementations of the switchboard principle Wiener described in 1948.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><p></p><h3>5. Spatial efficiency: foveation</h3><p>A thread running through Wiener&#8217;s treatment of vision is that the brain achieves capable perception not by processing everything uniformly and in parallel, but by being strategically non-uniform in both space and time.</p><p>The spatial side is foveation. The fovea provides high-resolution detail while the periphery offers broad, low-resolution motion detection. The brain doesn&#8217;t passively receive a full image, it actively steers the fovea toward informative regions via saccades, driven by a continuous feedback loop. The implication is that high-resolution processing is a scarce resource allocated dynamically, not applied uniformly.</p><h3>6. Temporal efficiency: the television analogy</h3><p>The temporal side is more surprising. Wiener observes that the brain may serialize what would otherwise require parallel hardware, using alpha waves (the ~10 Hz electrical rhythms visible in EEGs) as a scanning clock. Just as a television converts a two-dimensional image into a sequential stream by sweeping line by line, the brain may sweep through its representational space cyclically, interrogating stored patterns at each clock cycle. The efficiency principle is time-multiplexing: reuse the same hardware over time rather than duplicate it in space.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IyQF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IyQF!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 424w, https://substackcdn.com/image/fetch/$s_!IyQF!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 848w, https://substackcdn.com/image/fetch/$s_!IyQF!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 1272w, https://substackcdn.com/image/fetch/$s_!IyQF!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IyQF!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif" width="400" height="342" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:342,&quot;width&quot;:400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IyQF!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 424w, https://substackcdn.com/image/fetch/$s_!IyQF!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 848w, https://substackcdn.com/image/fetch/$s_!IyQF!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 1272w, https://substackcdn.com/image/fetch/$s_!IyQF!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Progressive scanning in a TV (<a href="https://msys-mv.blogspot.com/2010/11/understanding-basic-requirements-to.html">image source</a>)</figcaption></figure></div><p>Together these describe a coherent alternative to the architecture modern AI has converged on. Transformers process all positions in a spatially uniform and temporally instantaneous manner, which is expensive in both compute and energy. Biology does neither: it allocates spatial resolution selectively and serializes computation over time. Foveation-inspired architectures (glimpse networks, recurrent attention models) and ideas like conditional computation point in this direction but remain outside the mainstream, largely because uniform dense operations map cleanly onto GPU hardware. Wiener&#8217;s architectural intuitions may become increasingly relevant if the AI energy crisis makes the efficiency argument more economically compelling.</p><h3>7. Avoiding blunders: redundancy and verification</h3><p>The brain produces behavior of remarkable precision despite individual neurons being surprisingly unreliable: they fire spontaneously, transmit probabilistically, and have far worse signal-to-noise ratios than transistors. Wiener&#8217;s answer, developed in the psychopathology chapter, is that there are two complementary strategies for error correction. The first is the &#8220;<a href="https://englishverse.com/poems/the_hunting_of_the_snark">what I tell you three times is true</a>&#8221; strategy: running two or three computing mechanisms simultaneously on the same problem, so that errors can be recognized by agreement across parallel channels. The second is backtracking: sequential verification where the system checks its own output and revises when something goes wrong. One is spatial (parallel redundancy), the other is temporal (serial correction) &#8212; the same tradeoff from before, now applied to reliability rather than perception.</p><p>This maps directly onto one of the most discussed failure modes in LLMs: hallucinations. Wiener suggests that they are the expected behavior of a system optimized for speed without redundancy or verification, not simply a quirk to be patched. A single forward pass through a transformer produces an answer with no mechanism for catching its own errors. Reasoning models which iterate, self-check, and backtrack are exploiting exactly the reliability/overhead tradeoff Wiener described:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7vb_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7vb_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 424w, https://substackcdn.com/image/fetch/$s_!7vb_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 848w, https://substackcdn.com/image/fetch/$s_!7vb_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 1272w, https://substackcdn.com/image/fetch/$s_!7vb_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7vb_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png" width="1456" height="498" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:498,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:765303,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188648218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7vb_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 424w, https://substackcdn.com/image/fetch/$s_!7vb_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 848w, https://substackcdn.com/image/fetch/$s_!7vb_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 1272w, https://substackcdn.com/image/fetch/$s_!7vb_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Backtracking in DeepSeek R1 reasoning model (<a href="https://www.reddit.com/r/LocalLLaMA/comments/1id2gox/improving_deepseek_r1_reasoning_trace/">image source</a>, highlights mine)</figcaption></figure></div><p>But verification has limits. As I argued in the <a href="https://www.avikde.me/p/the-ai-world-models-debate-and-its">world models post</a>, a system that lacks a grounded semantic model of the world can cross-check its outputs without ever catching the deeper class of errors that stem from not understanding what it&#8217;s talking about.</p><h2>The human use of human beings</h2><p>Wiener was clearly one of the founders of the information age, but he was also deeply worried about what was being built. A passage from his follow-up book <em>The Human Use of Human Beings</em> reads like something written last week:</p><blockquote><p><em>The first industrial revolution was the devaluation of the human arm by the competition of machinery. The modern industrial revolution is similarly bound to devalue the human brain, at least in its simpler and more routine decisions. The average human being of mediocre attainments or less has nothing to sell that it is worth anyone&#8217;s money to buy.</em></p></blockquote><p>He was not predicting this as an inevitable law of nature. His proposed answer was equally striking: rather than trying to preserve the market value of human labor  artificially, he argued that society would need to restructure itself around non-market values like dignity, community, creativity, meaning. He wrote letters to labor unions warning them of what was coming, but he was not listened to.</p><p>In 2026, AI systems are starting to now inexpensively perform many of the cognitive tasks (writing, coding, analysis, translation, legal research) that defined middle-class professional employment in the twentieth century. The policy infrastructure to manage this transition does not exist. The urgency Wiener felt in 1950, when he had no working computer to point to, is more justified now.</p><p>Brian Christian, in his introduction to the recent reissue, <a href="https://brooklinebooksmith.com/book/9780063423190">calls Wiener</a> &#8220;the progenitor of contemporary AI safety discourse.&#8221; That may be the most accurate short description of the man. He was not a pessimist or a technophobe &#8212; he was a technologist who had thought seriously about what he was building and felt obligated to say what it implied. That combination of technical depth, ethical seriousness, and willingness to deliver uncomfortable conclusions publicly is just one more reason to read and remember him.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>I&#8217;m a strong proponent of reading and non-echo-chamber thinking. If you know of any other writing of this ilk, please let me know in the comments. If you liked this post, please share it, and subscribe!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/what-wiener-knew-about-artificial/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/what-wiener-knew-about-artificial/comments"><span>Leave a comment</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/what-wiener-knew-about-artificial?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/what-wiener-knew-about-artificial?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><div><hr></div><p><em>This post draws primarily on &#8220;Cybernetics: Or Control and Communication in the Animal and the Machine&#8221; and the biography &#8220;Dark Hero of the Information Age&#8221;. It continues themes from previous posts on von Neumann and world models in AI.</em></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;e2f67af9-7d5c-4dc9-aaae-e68cc06abe79&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What von Neumann understood about the architecture of intelligence before we built AI&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-19T19:17:48.188Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!_hYZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/what-von-neumann-understood-about&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:185086427,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;721822df-bdf5-4011-b9b2-b1be2d6818f1&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The AI world models debate and its foreshadowing on robotics&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-14T08:18:52.656Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Xry4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/the-ai-world-models-debate-and-its&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:184309659,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:4,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[Cache effects in object-oriented code: computer architecture meets programming]]></title><description><![CDATA[A simple demonstration revealed five layers of computer science & engineering abstraction fighting each other]]></description><link>https://www.avikde.me/p/cache-effects-in-object-oriented</link><guid isPermaLink="false">https://www.avikde.me/p/cache-effects-in-object-oriented</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Tue, 10 Feb 2026 15:30:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!p-27!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Having worked in robotics research and industry for over a decade, I&#8217;ve debugged enough real-time control loops to know that the programming language abstraction can be misleading. We write object-oriented code because it&#8217;s maintainable, composable, and maps cleanly to our mental models. A robot has limbs, limbs have joints, joints have positions and velocities, so we should create a hierarchy of objects accordingly, right?</p><p>When battery life is crucial, and when microseconds matter to ensure control loops remain stable, the hardware doesn&#8217;t care about elegant class hierarchy or beautiful code. The end-product of programming is <a href="https://www.youtube.com/watch?v=fHNmRkzxHWs">data transformation</a>, and not the code itself.</p><p>This post, written with my friend <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Greg Anderson&quot;,&quot;id&quot;:61562392,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b36e1378-3607-4d80-ba5f-2afa31a28123_144x144.png&quot;,&quot;uuid&quot;:&quot;18e87c2b-6325-4808-8804-0a4f47210032&quot;}" data-component-name="MentionToDOM"></span> (software engineer and CS lecturer), started as a simple teaching example about Array-of-Structures vs Structure-of-Arrays (AoS vs. SoA) layouts. We thought we&#8217;d show a clean and universal performance curve demonstrating cache effects tied to C++ code.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p-27!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p-27!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 424w, https://substackcdn.com/image/fetch/$s_!p-27!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 848w, https://substackcdn.com/image/fetch/$s_!p-27!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!p-27!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p-27!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg" width="732" height="755" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:755,&quot;width&quot;:732,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:121960,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p-27!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 424w, https://substackcdn.com/image/fetch/$s_!p-27!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 848w, https://substackcdn.com/image/fetch/$s_!p-27!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!p-27!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Different memory hierarchy levels one of the CPU 2 cores would access (excluding the system-level; L1 is inside the core). <a href="https://wccftech.com/a16-bionic-die-shot-details/">Original image source</a>.</figcaption></figure></div><p>We built what seemed like a straightforward benchmark: measure access time for different memory strides. What we didn&#8217;t anticipate was encountering five distinct issues spanning multiple abstraction layers&#8212;from compiler behavior to microarchitecture to hardware characteristics:</p><ol><li><p>The compiler deleted our measurement code and unpredictably stored variables in memory vs. registers</p></li><li><p>The CPU&#8217;s pipeline hazards dominated our memory access time</p></li><li><p>The CPU&#8217;s dynamic frequency scaling skewed our results</p></li><li><p>The hardware prefetcher made our predictions wrong</p></li><li><p>Different processors gave wildly different results</p></li></ol><p>This illustrates the gap between abstraction and performance. Programming languages provide abstraction above the hardware, but achieving good performance requires understanding how code executes on the underlying architecture. While some of our issues may be familiar to experienced programmers, others might be surprising even to veterans.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>Four examples showing why you should care</h3><p>There are many real-world examples using an "array of structures" organization for good reasons: it's faster to prototype, easier to reason about when objects manage their own state, and typically more readable for developers.</p><p><strong>Example 1: PCL (Point Cloud Library) </strong><a href="https://pointclouds.org/documentation/point__types_8hpp_source.html">PointXYZRGB</a> structure.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ckiW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ckiW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 424w, https://substackcdn.com/image/fetch/$s_!ckiW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 848w, https://substackcdn.com/image/fetch/$s_!ckiW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 1272w, https://substackcdn.com/image/fetch/$s_!ckiW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ckiW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png" width="1456" height="501" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf47b546-6854-4917-bc18-d35443322840_2184x752.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:501,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:139001,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ckiW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 424w, https://substackcdn.com/image/fetch/$s_!ckiW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 848w, https://substackcdn.com/image/fetch/$s_!ckiW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 1272w, https://substackcdn.com/image/fetch/$s_!ckiW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When you have <code>pcl::PointCloud&lt;PointXYZRGB&gt;</code> with millions of points, the memory layout looks like</p><pre><code>[x0, y0, z0, pad, rgb0, x1, y1, z1, pad, rgb1, ...]</code></pre><p>For an example task of filtering by distance (operating on the xyz only), we get 40% extra cache misses. For a color segmentation task operating on rgb only, 4x extra cache misses.</p><p><strong>Example 2: Unity <a href="https://docs.unity3d.com/510/Documentation/Manual/TheGameObject-ComponentRelationship.html">GameObject-Component System</a></strong>. GameObjects directly contain Component instances by value, e.g. a GameObject with Transform, Rigidbody, and Collider components stored as member data. This is classic AoS: each GameObject owns its component data, providing flexible composition but poor cache locality when iterating over many objects.</p><p><strong>Example 3: Box2D (version 2.x). </strong>Each b2Body contains position, velocity, and force data as members (e.g. <code>b2Vec2 m_linearVelocity</code>). Most traditional object-oriented game engines before the <a href="https://cowboyprogramming.com/2007/01/05/evolve-your-heirachy/">ECS trend</a> used composition with value semantics&#8212;each enemy/player/NPC object contained all its data directly. However, Box2D v3.0 (2024) moved away from this, now using handle-based IDs and storing body data separately for better performance.</p><p><strong>Example 4: Humanoid joints. </strong>Last but not least, here is a practical example of a humanoid robot joint that should be quite relatable:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_KTd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_KTd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 424w, https://substackcdn.com/image/fetch/$s_!_KTd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 848w, https://substackcdn.com/image/fetch/$s_!_KTd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 1272w, https://substackcdn.com/image/fetch/$s_!_KTd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_KTd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png" width="1456" height="621" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:621,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:223276,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_KTd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 424w, https://substackcdn.com/image/fetch/$s_!_KTd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 848w, https://substackcdn.com/image/fetch/$s_!_KTd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 1272w, https://substackcdn.com/image/fetch/$s_!_KTd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A Limb would be composed of Joints, with Joint specialized into different joint types.</p><p>Suppose the humanoid robot has 50 joints. Computing Jacobians requires accessing each joint&#8217;s position: 5.5kB loaded into cache (87 cache misses), when we only need 200 bytes (4 cache misses if organized as an array of positions).</p><p>Now that we have shown that this organization occurs commonly, we will dig in and try to measure the effect it has.</p><h3>An even simpler example to dig into</h3><p>We created an even simpler example with a single data array and a parameterized &#8220;stride&#8221; for a strided access pattern. This would occur in the example above with <code>stride = sizeof(Joint)</code>. Our goal was to time how long it takes to access a fixed number of elements with different strides, as in the code below.</p><p><em>The actual code for replicating all these measurements, and more, is <a href="https://github.com/avikde/caching-tester">on github</a>: feel free to try it out.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!avv2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!avv2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 424w, https://substackcdn.com/image/fetch/$s_!avv2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 848w, https://substackcdn.com/image/fetch/$s_!avv2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 1272w, https://substackcdn.com/image/fetch/$s_!avv2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!avv2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png" width="1456" height="1390" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1390,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:451153,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!avv2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 424w, https://substackcdn.com/image/fetch/$s_!avv2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 848w, https://substackcdn.com/image/fetch/$s_!avv2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 1272w, https://substackcdn.com/image/fetch/$s_!avv2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What we expect to see:</strong> Effectively, as we access data, the processor can load a segment from main memory into cache, in blocks. </p><pre><code>Data:    |x| | | | |x| |...
          &#8592; stride &#8594;
Cache:   |y|y|y|y|y|y|y|y|z|z|...
          &#8592; line size  &#8594;</code></pre><p>As stride increases, visiting the same number of elements requires caching more blocks. If memory movement dominates, we expect a linear rise in time as stride increases and more cache lines are touched. (More on what happens after each access hits a separate cache line below.)</p><p>Understanding the results from this "simple" example felt like peeling endless layers of an onion, but was very gratifying at the same time!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h4>Issue 1: Controlling compiler optimizations</h4><p>With the code snipped above, the <a href="https://godbolt.org/z/bxdMzcn4c">assembler output</a> showed:</p><pre><code>testStride(unsigned long):
        ret
data:
        .zero   256000000</code></pre><p>Of course! <code>sink</code> was being optimized out, and my firmware programming background caused me to add a volatile to its declaration. However, something in the asm output for the loop looked amiss. Can you spot it?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HWZZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HWZZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 424w, https://substackcdn.com/image/fetch/$s_!HWZZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 848w, https://substackcdn.com/image/fetch/$s_!HWZZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!HWZZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HWZZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png" width="1456" height="673" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:673,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:216623,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HWZZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 424w, https://substackcdn.com/image/fetch/$s_!HWZZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 848w, https://substackcdn.com/image/fetch/$s_!HWZZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!HWZZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While data should be loaded from memory to a register, sink should be able to remain in a register. However, volatile forces it to be loaded and stored because the compiler must assume that it can be modified externally. So we get rid of volatile, and uncomment the last line:</p><pre><code>if (sink == -1.0f) std::cout &lt;&lt; "";</code></pre><p>The new loop looks like</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DZpq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DZpq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 424w, https://substackcdn.com/image/fetch/$s_!DZpq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 848w, https://substackcdn.com/image/fetch/$s_!DZpq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 1272w, https://substackcdn.com/image/fetch/$s_!DZpq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DZpq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png" width="1456" height="573" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:573,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:159673,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DZpq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 424w, https://substackcdn.com/image/fetch/$s_!DZpq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 848w, https://substackcdn.com/image/fetch/$s_!DZpq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 1272w, https://substackcdn.com/image/fetch/$s_!DZpq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Comparing to the assembly above, the extra load-store are gone - first mystery solved.</p><p><em>Issue source: compiler / programming language</em></p><h4>Issue 2: Data dependency hazard</h4><p>The relevant part of the loop looked like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XNLx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XNLx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 424w, https://substackcdn.com/image/fetch/$s_!XNLx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 848w, https://substackcdn.com/image/fetch/$s_!XNLx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 1272w, https://substackcdn.com/image/fetch/$s_!XNLx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XNLx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png" width="1456" height="652" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:652,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:138816,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XNLx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 424w, https://substackcdn.com/image/fetch/$s_!XNLx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 848w, https://substackcdn.com/image/fetch/$s_!XNLx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 1272w, https://substackcdn.com/image/fetch/$s_!XNLx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Timing this loop as we varied stride showed that for the first few strides, increasing stride had <em>no effect on the time</em> (solid lines in the plot below). With an Apple M2:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/zv0Qi/2/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1380f094-4597-42e7-aea9-c0e8e7288f63_1220x782.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e308d11-af64-4677-9824-988cd411b049_1220x852.png&quot;,&quot;height&quot;:418,&quot;title&quot;:&quot;Accumulate - M2 - Time(us) vs. Stride(floats)&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/zv0Qi/2/" width="730" height="418" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>With the size of our loop, increasing stride definitely means more cache lines are touched, but it is making no difference. What&#8217;s going on?</p><p>Let&#8217;s look back at <a href="https://godbolt.org/z/GqKzdPf7h">the assembly</a> (same as the previous snippet). </p><p>If we manually unroll a few iterations, we have the following pattern:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HVjb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HVjb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 424w, https://substackcdn.com/image/fetch/$s_!HVjb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 848w, https://substackcdn.com/image/fetch/$s_!HVjb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 1272w, https://substackcdn.com/image/fetch/$s_!HVjb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HVjb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png" width="1456" height="526" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:526,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:131052,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HVjb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 424w, https://substackcdn.com/image/fetch/$s_!HVjb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 848w, https://substackcdn.com/image/fetch/$s_!HVjb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 1272w, https://substackcdn.com/image/fetch/$s_!HVjb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>FP add 2 cannot issue until FP add 1 has been committed</em>, a classic Read-After-Write hazard. While a chip designer understands this very well, a programmer rarely needs to understand data dependency hazards in CPU pipelining. In this example, the float add dominates the effects from the load/store due to the data dependency and the long latency of floating-point add.</p><p>We add an unrolled version:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XTJf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XTJf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 424w, https://substackcdn.com/image/fetch/$s_!XTJf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 848w, https://substackcdn.com/image/fetch/$s_!XTJf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 1272w, https://substackcdn.com/image/fetch/$s_!XTJf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XTJf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png" width="1456" height="1263" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1263,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:567714,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XTJf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 424w, https://substackcdn.com/image/fetch/$s_!XTJf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 848w, https://substackcdn.com/image/fetch/$s_!XTJf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 1272w, https://substackcdn.com/image/fetch/$s_!XTJf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The unrolled version time is significantly smaller, as visible in the dashed lines in the plot above, and more importantly, now we see the linear rise we had predicted.</p><p><em>Issue source: microarchitecture, not visible in assembly instructions</em></p><h4>Issue 3: Warmup effects</h4><p>After root-causing issue 2, to avoid dealing with the unrolled loop, we changed the accumulate to a Read-Modify-Write. The time for each iteration is now longer because a load and store are required for each iteration, which should make data movement costs the dominating factor.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cAkz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cAkz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 424w, https://substackcdn.com/image/fetch/$s_!cAkz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 848w, https://substackcdn.com/image/fetch/$s_!cAkz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 1272w, https://substackcdn.com/image/fetch/$s_!cAkz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cAkz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png" width="1456" height="591" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/efff1b84-497c-4c57-93e9-41490759e252_2080x844.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:591,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:119668,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cAkz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 424w, https://substackcdn.com/image/fetch/$s_!cAkz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 848w, https://substackcdn.com/image/fetch/$s_!cAkz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 1272w, https://substackcdn.com/image/fetch/$s_!cAkz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A number of stateful microarchitectural effects unrelated to the data cache contribute to performance characteristics, yet produce data cache-like behaviors. Such factors may include page table caching, page walk caching, prefetcher training, the memory controller, and even frequency ramping.</p><p>We attempted to stabilize the effect of these factors before running trials by running a warmup function at the beginning of the program. The warmup simply iterates over every element of data once to have the cache in a predictable state.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ef2a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ef2a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 424w, https://substackcdn.com/image/fetch/$s_!Ef2a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 848w, https://substackcdn.com/image/fetch/$s_!Ef2a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 1272w, https://substackcdn.com/image/fetch/$s_!Ef2a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ef2a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png" width="1456" height="1821" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1821,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:450653,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ef2a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 424w, https://substackcdn.com/image/fetch/$s_!Ef2a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 848w, https://substackcdn.com/image/fetch/$s_!Ef2a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 1272w, https://substackcdn.com/image/fetch/$s_!Ef2a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The results:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/QZG6B/2/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/297e46d2-7e49-4823-a44d-bec6357008b9_1220x818.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b8348ed8-e5b0-4f6e-9ef3-571ac613291c_1220x888.png&quot;,&quot;height&quot;:435,&quot;title&quot;:&quot;Read-Modify-Write - M2 - Time(us) vs. Stride(floats)&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/QZG6B/2/" width="730" height="435" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>The warmup appears to universally make the program faster, irrespective of the stride (more pronounced effect on a different system in plots below). Our best guess is that the warmup ramping up the CPU frequency is the dominant effect. We also considered a trial for one stride affecting another, but running a single stride per run of the program didn&#8217;t yield clearer results (and took much longer).</p><p>Again, if you have any better ideas, we would love to know - please leave a comment!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/cache-effects-in-object-oriented/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/cache-effects-in-object-oriented/comments"><span>Leave a comment</span></a></p><p><em>Issue source: microarchitecture / hardware</em></p><h4>Issue 4: Initial no-effect; second slope after cache line boundary</h4><p><strong>4a) </strong>In the previous plot, there is an initial part till about stride ~5 (20 bytes) where we predicted a linear rise, but instead see no effect of stride on timing.</p><p>While we are not sure, this is likely due to hardware prefetching: Modern CPUs have hardware prefetchers that detect sequential / strided access patterns and automatically fetch data ahead of time. Once the stride grows large enough (~20-64 bytes), the prefetcher can no longer keep up&#8212;either because it can&#8217;t fetch far enough ahead, or because the access pattern becomes too sparse for it to predict. At this point, we finally see the expected linear increase as each access genuinely waits for data from main memory.</p><p><strong>4b) </strong>We expected the access time to plateau after each access was already hitting a different cache line. However, there appears to be a slower rise after the cache line boundary at least on the Apple M2 processor</p><p>Some (unconfirmed) hypotheses for the slower rise after the boundary:</p><ul><li><p>L1 &#8594; L2 spilling if the working set exceeds L1 capacity, incorporating L2 access times</p></li><li><p>TLB misses as large strides access many different memory pages</p></li></ul><p><em>Issue sources: microarchitecture / hardware</em></p><h4>Issue 5: Different behavior on different processors</h4><p>Throughout uncovering the previous issues, we ran a few tests on other processors, and unfortunately that only served to increase the number of unknowns. In this section we will show you some of those results, but only be able to speculate about what causes them.</p><p>With an AMD Zen5 processor:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/YGC14/3/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00cc2fe8-4256-4837-ac82-b8e1189cd916_1220x782.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6bc5baad-538b-4973-bd73-8f10a69020b9_1220x852.png&quot;,&quot;height&quot;:418,&quot;title&quot;:&quot;Accumulate - Zen5, MSVC - Time(us) vs. Stride(floats)&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/YGC14/3/" width="730" height="418" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/H1CJl/4/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f4941e2-4668-43c5-8a6c-04802a6d67ab_1220x818.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d74b2b08-b613-4628-b95b-95010c14d2f5_1220x938.png&quot;,&quot;height&quot;:461,&quot;title&quot;:&quot;Read-Modify-Write - Zen5, MSVC - Time(us) vs. Stride(floats)&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/H1CJl/4/" width="730" height="461" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>We see a plateau after an initial rise, which matches our naive prediction.</p><p>However, we observe a <strong>peak around 32 floats (128 bytes) followed by a drop</strong>. We don&#8217;t have an explanation for this behavior, which may be to do with advanced prefetcher behavior. In other words, the hardware may be making assumptions about our access pattern, and stride = 64-128 bytes hits the worst-case scenario where those assumptions fail. If you have any ideas about the cause, let us know in the comments!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/cache-effects-in-object-oriented/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/cache-effects-in-object-oriented/comments"><span>Leave a comment</span></a></p><p></p><p>We also tested on an Intel processor on Windows, which confirmed that some of the strangest aspects of the two plots above are to do with AMD, and not the compiler.</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/hqmDr/2/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79ecdefe-e9c5-4b64-a2fc-30345ffbe2b6_1220x782.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a3ed93d-feec-4b04-a686-8d67812b078a_1220x902.png&quot;,&quot;height&quot;:418,&quot;title&quot;:&quot;Read-Modify-Write - Intel MSVC - Time(us) vs. Stride(floats)&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/hqmDr/2/" width="730" height="418" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>This resembles our Apple M2 plots more closely, including the slower rise after the cache line boundary. It also adds an even slower rise after 2x the cache line boundary.</p><p><em>Issue sources: secret microarchitectural optimizations</em></p><h3>Back to programming</h3><p>Through this journey, it is safe to say we learned a lot about the complexity of modern processors. Fortunately, though, our central point about the (initial, then plateauing) rise of access time with stride still stands as universally true. Phew!</p><p>How do we utilize this knowledge as a programmer? <strong>The key is to ensure that commonly-accessed data is packed tightly in contiguous memory.</strong></p><p>The naive OOP concept of owning data:</p><ul><li><p>The class directly contains/owns the data as member variables</p></li><li><p>Example: Joint class with float sensed_position (and other things) embedded in it</p></li><li><p>This creates the AoS memory layout problem</p></li></ul><p><strong>Instead store indices. </strong>In the literature on data-oriented design, this is sometimes called: Entity-Component-System (ECS) pattern, or data-oriented design with handles.</p><ul><li><p>The class contains references, pointers, or indices to data stored elsewhere</p></li><li><p>This allows you to keep polymorphism while avoiding AoS layout issues</p></li></ul><p><strong>It isn&#8217;t object-oriented vs. polymorphism. </strong>Just to reiterate that data-oriented is not opposed to OOP conveniences, consider that Pinocchio <a href="https://github.com/stack-of-tasks/pinocchio/blob/devel/include/pinocchio/multibody/joint/joint-model-base.hpp">uses polymorphism to specialize functions</a>, but stores indices to the vectors, not the data itself. The actual positions and velocities live in contiguous arrays, giving cache-friendly SoA layout, while the polymorphic joint models provide the OOP interface. You can have the benefits of polymorphism (different joint types with specialized behavior) without the memory layout problems of AoS. This is the middle ground between pure OOP with composition and abandoning OOP entirely for data-oriented design.</p><h3>Closing thoughts</h3><p>In this post, we first showed how OOP-thinking can naturally lead to suboptimal cache usage, with several real examples. Then we looked at the effects this can have, uncovering many interesting &#8220;side-quest&#8221; root-causing exercises.</p><p>It isn&#8217;t coincidence that modern performance-critical systems say no to naive composed OOP:</p><ul><li><p><strong>Machine learning</strong> libraries will often select the data layout (NCHW etc.) <a href="https://mlsysbook.ai/book/contents/core/hw_acceleration/hw_acceleration.html#sec-ai-acceleration-memoryefficient-tensor-layouts-e250">transparently</a>, optimizing for cache locality.</p></li><li><p><strong>Pinocchio</strong>, a robotics kinematics / dynamics library, has its functions <a href="https://github.com/search?q=repo%3Astack-of-tasks/pinocchio%20forwardKinematics&amp;type=code">access array data</a>.</p></li><li><p><strong>Drake</strong>, a larger robotics-oriented library, eventually <a href="https://github.com/RobotLocomotion/drake/blob/master/multibody/tree/multibody_tree-inl.h">has data in arrays</a> below abstraction layers.</p></li><li><p><strong><a href="https://unity.com/dots">Unity DOTS</a></strong> stores all Transform data in packed arrays, not in GameObjects.</p></li><li><p><strong>Box2D v3.0</strong> switched from OOP bodies to ID-based handles with SoA storage.</p></li><li><p><strong><a href="https://dev.epicgames.com/documentation/en-us/unreal-engine/mass-entity-in-unreal-engine">Unreal Mass Entity</a></strong> is an ECS system for high-object-count scenarios.</p></li></ul><p>Even if in an isolated example the performance gain seems small, these patterns occur so frequently that they can <a href="https://youtu.be/fHNmRkzxHWs">add up to large losses that are difficult to eliminate</a>.</p><p>Thanks for reading! If you enjoyed this kind of full-stack analysis and root-causing, please share and subscribe for more posts on robotics, AI, and computing.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/cache-effects-in-object-oriented?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/cache-effects-in-object-oriented?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h3>References and further reading</h3><ul><li><p>Code for demonstrations in this post, and more on <a href="https://github.com/avikde/caching-tester">github</a></p></li><li><p>&#8220;Better memory representation&#8221; in Jeff Dean&#8217;s &#8220;<a href="https://abseil.io/fast/hints.html#better-memory-representation">Performance Hints</a>&#8221;</p></li><li><p>&#8220;<a href="https://youtu.be/fHNmRkzxHWs">Efficiency with Algorithms, Performance with Data Structures</a>&#8221; - Chandler Carruth [CppCon 2014]. <strong>Note: </strong>I don&#8217;t fully agree with the statement (10:45) that &#8220;efficiency is only affected by algorithms&#8221; - a good example is the energetic cost of moving a byte from DRAM -&gt; core being significantly higher than from L1, meaning the same algorithm with poor cache performance actually consumes more energy, in addition to completing slower.</p></li><li><p><a href="https://youtu.be/rX0ItVEVjHc">Data-Oriented Design and C++</a> - Mike Acton [CppCon 2014] </p></li><li><p>Explicit cache control via <a href="https://en.wikipedia.org/wiki/Cache_control_instruction">software prefetching</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA["Is it learning?" Online motor adaptation in end-to-end robotics]]></title><description><![CDATA[Part 2: Where the low-level controller responds to the unexpected]]></description><link>https://www.avikde.me/p/is-it-learning-online-motor-adaptation</link><guid isPermaLink="false">https://www.avikde.me/p/is-it-learning-online-motor-adaptation</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Tue, 03 Feb 2026 17:51:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!x8Re!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This article is part of a series on end-to-end robotics pipelines:</em></p><ol><li><p><a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">Architecture of end-to-end: learning &#8594; control</a></p></li><li><p>This article</p></li><li><p><a href="https://www.avikde.me/p/debugging-as-architecture-insight">Dissecting a VLA</a></p></li><li><p><a href="https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">Closing the action loop with a VLM &#8220;agent&#8221;</a></p></li><li><p><a href="https://www.avikde.me/p/building-a-reasoning-hierarchical">Demo combining the best features of end-to-end and classical approaches</a></p></li></ol><div><hr></div><p>Last week, I wrote about modern end-to-end robotics pipelines; why this is the new north star, and the hidden architecture behind successful implementations. Part 1 reviewed some implementations showing signs of a <strong>cascade of a high-level (HL) &#8594; low-level (LL) controller</strong> in the actuation end of the pipeline:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0d685309-d58e-46dd-9d5c-6f3be4457d31&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The architecture behind &#8220;end-to-end&#8221; robotics pipelines&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Writing about safe, efficient AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-26T21:19:56.368Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!766O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/the-architecture-behind-end-to-end&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:185869291,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:5,&quot;comment_count&quot;:11,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>I have first-hand experience demonstrating walking robots to customers in a sandy desert, and as the robots slipped they asked, <strong>&#8220;is it learning?&#8221;</strong> With the prior of people adapting their gait as they walk on ice (for example), a reasonable expectation is that an isolated robot can adjust its behavior after some trial &amp; error or adaptation period.</p><p>However, this is not how naive foundation-model end-to-end pipelines (such as those covered in part 1) work today; a particular robot can only change its behavior once the &#8220;hive brain&#8221; is updated with new data in its training. Due to the size of these models, it is impractical that training happens on-device or frequently.</p><p>So, in part 2, we ask: <strong>how can a fielded robot adapt to unexpected conditions? </strong>Why do we even need adaptability? Given the HL &#8594; LL controller cascade structure in modern end-to-end pipelines from part 1, where does this adaptability live, and how does it affect the mapping to computing hardware? Lastly, we will also look at some published implementations and see how they approach or ignore this issue.</p><h3>Updates to part 1, hot off the presses</h3><p>Before we dig into part 2, I need to add a couple of updates from relevant news releases that I wasn&#8217;t able to review before <a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">part 1</a> was published (Jan 26):</p><ol><li><p><strong>Microsoft&#8217;s Rho-Alpha model announcement with <a href="https://open.substack.com/pub/bdtechtalks/p/inside-rho-alpha-microsofts-new-robotics?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">commentary on Tech talks</a> (Jan 24) reveals &#8220;split architecture&#8221; including dedicated low-level controller, underscoring at least two points in the part 1 post. </strong>(a) Tactile and proprioceptive information is incorporated in the action expert, showing that the action head facilitates <em>feedback</em> loops; (b) higher <em>control bandwidth</em> via so-called bypass mechanism. Quoting the post, &#8220;The long-term goal, Kolobov said, is to have the action expert or a part of it operate on proprioception and physical sensing modalities at a significantly higher frequency than on visual and language data.&#8221;</p></li><li><p><strong><a href="https://www.figure.ai/news/helix-02">Figure Helix 02 Jan 27 update</a> reveals new &#8220;System 0&#8221; controller, underscoring at least four points in the part 1 post</strong>. The &#8220;system 0&#8221; implementation is described as a dedicated whole-body controller (WBC), which conventionally converts desired accelerations or velocities to joint torques based on a model of the robot. (a) S1 went from controlling the upper body to the whole body, and this reduced the overall system complexity by <em>separating concerns</em>; (b) S0 and S1 incorporate tactile data in tighter <em>feedback loops</em>, without adding complexity to the large VLM S2; (c) S0 runs at a KHz rate increasing the last-level <em>control bandwidth</em>; (d) it is trained for that specific robot (vs. cross-embodiment), localizing robot body-related parameters in one place (and presumably enabling generalization of S2/S1 to a different robot). The purpose of the WBC is similar to the model-based reference in part 1, but the difference here is that it is also a neural network trained from data.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p></li></ol><p>I expect that we will continue to see further evidence and refinement of hierarchical control structures in commercial robots, vs. unstructured end-to-end pipelines. Make sure to subscribe to get future updates:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><em>This publication and this post contain the author&#8217;s personal thoughts and opinions only, and do not reflect the views of any companies or institutions.</em></p><h3>Why do we need adaptability?</h3><p>When a robot leaves the lab and is in customers&#8217; hands, it will at some point inevitably be subjected to an unexpected operating condition, stemming from component failure, perturbation, environmental condition, or operating condition (e.g. payload). To address this, one recourse is to build a large-enough model that has enough experience to handle all these situations (i.e. domain randomization, multi-embodiment, etc.). This of course takes (much) more data and more training, as OpenAI showed from their dexterity result in 2019:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NNYM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NNYM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 424w, https://substackcdn.com/image/fetch/$s_!NNYM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 848w, https://substackcdn.com/image/fetch/$s_!NNYM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 1272w, https://substackcdn.com/image/fetch/$s_!NNYM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NNYM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg" width="505" height="306.68016194331983" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:150,&quot;width&quot;:247,&quot;resizeWidth&quot;:505,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Learning Progress graph&quot;,&quot;title&quot;:&quot;Learning Progress graph&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Learning Progress graph" title="Learning Progress graph" srcset="https://substackcdn.com/image/fetch/$s_!NNYM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 424w, https://substackcdn.com/image/fetch/$s_!NNYM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 848w, https://substackcdn.com/image/fetch/$s_!NNYM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 1272w, https://substackcdn.com/image/fetch/$s_!NNYM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Plot from <a href="https://arxiv.org/pdf/1808.00177">OpenAI Dactyl paper</a> (2019) showing the difference in required training without and with domain randomization (note the log-scale).</figcaption></figure></div><p>The other option is adaptation in a strategic part of the pipeline to address as many of these variations as possible. In this post, we are focused on the action end of the pipeline, and the classes of variation we are interested in include variability in joints / motors (friction, motor torque), terrain properties, payload.</p><p>Let&#8217;s clarify the timescale hierarchy, because the word &#8220;adaptation&#8221; can refer to changes at various timescales. Within-movement corrections can happen in milliseconds, and is typically part of reactive control within the low-level controller. Skill acquisition across many tasks using large datasets during training will typically happen offline. The intermediate adjustment occurring in the seconds-to-minutes timescale, which we refer to as motor adaptation, is the focus of this post.</p><h3>Historical context from biology, control theory, and LLMs</h3><p>Cerebellar timescales (seconds to minutes) match closely with the motor adaptation timescale referred to above, and several research efforts identify its role in adaptation of behavior in that time range.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x8Re!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x8Re!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 424w, https://substackcdn.com/image/fetch/$s_!x8Re!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 848w, https://substackcdn.com/image/fetch/$s_!x8Re!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 1272w, https://substackcdn.com/image/fetch/$s_!x8Re!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x8Re!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png" width="1130" height="319" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d660113e-a952-423f-ae02-10f42c37f790_1130x319.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:319,&quot;width&quot;:1130,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:307379,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/186635241?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!x8Re!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 424w, https://substackcdn.com/image/fetch/$s_!x8Re!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 848w, https://substackcdn.com/image/fetch/$s_!x8Re!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 1272w, https://substackcdn.com/image/fetch/$s_!x8Re!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure from <a href="https://pubmed.ncbi.nlm.nih.gov/26646076/">Weaver (2015)</a> (commentary on <a href="https://pubmed.ncbi.nlm.nih.gov/26645916/">Kim (2015)</a>) showing the role of the cerebellum in storing multiple internal models, and adapting at different timescales.</figcaption></figure></div><p><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC6674518">Morton (2006)</a> further associates the cerebellum with motor adaptation and the spinal column to reactive control:</p><blockquote><p>Cerebellar damage does not impair the ability to make reactive feedback-driven motor adaptations, but significantly disrupts predictive feedforward motor adaptations during splitbelt treadmill locomotion &#8230; The cerebellum seems to play an essential role in predictive but not reactive locomotor adjustments. We postulate that reactive adjustments may instead be predominantly controlled by lower neural centers, such as the spinal cord or brainstem.</p></blockquote><p>In control theory, there is a long tradition of adaptive control and model-reference adaptive control (MRAC) which utilize a (model-based) adjustment mechanism to modify the parameters of the controller.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Uzct!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Uzct!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 424w, https://substackcdn.com/image/fetch/$s_!Uzct!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 848w, https://substackcdn.com/image/fetch/$s_!Uzct!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 1272w, https://substackcdn.com/image/fetch/$s_!Uzct!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Uzct!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png" width="462" height="266.2916666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:332,&quot;width&quot;:576,&quot;resizeWidth&quot;:462,&quot;bytes&quot;:25531,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/186635241?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Uzct!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 424w, https://substackcdn.com/image/fetch/$s_!Uzct!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 848w, https://substackcdn.com/image/fetch/$s_!Uzct!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 1272w, https://substackcdn.com/image/fetch/$s_!Uzct!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig. 5.1 in Astrom &amp; Wittenmark &#8220;Adaptive Control&#8221; shows the block diagram of a model-reference adaptive system (MRAS).</figcaption></figure></div><p>The arrows in the figure above reveal a degree of interconnectedness beyond the cascade connections we primarily reviewed in part 1. The adjustment mechanism can also act in discrete steps instead of continuously, or without a model, in which cases it is called &#8220;gain scheduling&#8221;.</p><p>Self-improving learning systems are beginning to appear in the news more frequently in the LLM world: Ilya Sutskever <a href="https://www.dwarkesh.com/p/ilya-sutskever-2">said in Nov 2025</a>, &#8220;There has been one big idea that everyone has been locked into, which is the self-improving AI&#8221;. The aforementioned Rho-Alpha model has an ability to update weights while running using teleoperation feedback. However, this can lead to a <a href="https://arxiv.org/abs/2510.15103">common side-effect</a> called &#8220;catastrophic forgetting&#8221; due to all weights being in one huge monolithic structure, and so updates needed to be made either in judicious layers or in careful batches.</p><h3>Motor adaptation in practice</h3><p>One advantage in robotics pipelines is that they may (as we saw in part 1) have a hierarchical HL &#8594; LL structure. In such a situation, there are <em>motor</em> adaptations that can be integrated the LL controller without impacting the behavior of the HL controller, sidestepping the catastrophic forgetting issue.</p><p>I&#8217;ll go over a few illustrative examples, and especially discuss their ability to handle unexpected conditions. If I missed an idea that is pertinent and relevant, let me know in the comments:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/is-it-learning-online-motor-adaptation/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation/comments"><span>Leave a comment</span></a></p><p></p><h4>Adaptation in model-based LL control: robot arms, drones, humanoids</h4><p>The pre-&#8221;end-to-end&#8221; era had many examples of adaptation in practice. Old ideas such as MRAC show up in industrial and commercial manipulators, such as in the <a href="https://www.universal-robots.com/manuals/EN/HTML/SW5_19/Content/prod-usr-man/software/PolyScope/content/installation_g5/Payload_en.htm">payload estimation</a> feature in Universal Robots arms. Commercial drones estimate wind to remain stable, sometimes <a href="https://arxiv.org/abs/2205.06908">using neural networks</a>. In a <a href="https://arxiv.org/pdf/1904.12306">2019 HyQ paper</a>, an explicit terrain compliance estimation module estimates parameters used by the LL controller. In a 2023 demonstration of the Atlas robot using model-based controllers while picking up heavy objects, Atlas &#8220;<a href="https://spectrum.ieee.org/atlas-robot">has access to the mass properties</a>&#8221; of the object it is picking up, which I would lump into a gain-scheduling type of approach.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EYci!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EYci!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 424w, https://substackcdn.com/image/fetch/$s_!EYci!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 848w, https://substackcdn.com/image/fetch/$s_!EYci!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 1272w, https://substackcdn.com/image/fetch/$s_!EYci!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EYci!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png" width="589" height="162.3919523099851" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:185,&quot;width&quot;:671,&quot;resizeWidth&quot;:589,&quot;bytes&quot;:31996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/186635241?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EYci!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 424w, https://substackcdn.com/image/fetch/$s_!EYci!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 848w, https://substackcdn.com/image/fetch/$s_!EYci!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 1272w, https://substackcdn.com/image/fetch/$s_!EYci!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Figure from <a href="https://arxiv.org/pdf/1904.12306">Fahmi et al (2019)</a> showing adaptation module interfacing with model-based WBC.</figcaption></figure></div><p>In all these examples, because the LL controller is model-based, it is easier to adapt for quantities like payload mass because it is clear where those terms appear in the controller. This is an advantage of having physically interpretable parameters, compared to black-box latent-space interconnections.</p><p>Mapping to computation:</p><pre><code>HL &#8594; WBC inverse dynamics/QP (CPU) &#8594; Joint/servo controllers (microcontroller/CPU) &#8594; Torques
                &#8593;
    *Adjustment mechanism (CPU/GPU)*</code></pre><h4>Meta-learning for adapting among training environments</h4><p>The concept of <a href="https://arxiv.org/pdf/1803.11347">meta-learning (2019)</a> is targeted at the motor adaptation problem, but needs samples over environments during training. This leads to the aforementioned prolonged training and large models, as well as susceptibility to truly unexpected (out-of-distribution) conditions. The authors of the paper are among the founders of Physical Intelligence, so it is possible that they could institute meta-learning-type methods for online adaptation in their action expert (not the case today as far as I can tell).</p><p>Mapping to computation of this hypothetical scenario:</p><pre><code>VLM (GPU) &#8594; Action expert *with internal model and meta-learning* (GPU/CPU) &#8594; Trajectory tracking (CPU) &#8594; Torques</code></pre><h4>Learning-based latent parameter estimation for locomotion</h4><p>As my sand locomotion example above might hint at, unexpected payload and terrain conditions are particularly prevalent in locomotion.</p><p><a href="https://ashish-kmr.github.io/rma-legged-robots/">RMA: Rapid Motor Adaptation (2021)</a> introduces a dedicated adaptation module that predicts a set of &#8220;latent parameters&#8221; that can adjust the action policy to better suit different conditions. These varied conditions are trained by randomizing in simulation, potentially suffering from a few of the same issues with out-of-distribution encounters and training difficulty.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uCHu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uCHu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 424w, https://substackcdn.com/image/fetch/$s_!uCHu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 848w, https://substackcdn.com/image/fetch/$s_!uCHu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 1272w, https://substackcdn.com/image/fetch/$s_!uCHu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uCHu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png" width="941" height="217" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:217,&quot;width&quot;:941,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:147962,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/186635241?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uCHu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 424w, https://substackcdn.com/image/fetch/$s_!uCHu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 848w, https://substackcdn.com/image/fetch/$s_!uCHu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 1272w, https://substackcdn.com/image/fetch/$s_!uCHu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">RMA figure from <a href="https://ashish-kmr.github.io/rma-legged-robots/rma-locomotion-final.pdf">their paper</a> showing adaptation module running at a lower rate.</figcaption></figure></div><p>One of the authors founded Skild.AI, and this quote from their <a href="https://www.skild.ai/blogs/one-policy-all-scenarios">Aug 2025 blog post</a></p><blockquote><p>A striking aspect of our model is that it is not just <em><strong>robust</strong></em>, but it is also <em><strong>adaptive</strong></em> and <em>graceful</em></p></blockquote><p>(emphasis theirs) suggests incorporation of something like RMA. Absent too many details, here is my best guess of the composed pipeline mapped to computational hardware:</p><pre><code><code>HL action policy (GPU) &#8594; *Adaptation module (GPU)* &#8594; LL action policy (GPU) &#8594; Torques</code></code></pre><p>Where RMA had a large-ish latent vector, there are similar approaches toward predicting parameters with more physical meaning, from a <a href="https://www.science.org/doi/10.1126/scirobotics.ade2256">reduced</a> or a <a href="https://arxiv.org/abs/2202.05481">full state estimate</a>. These state-estimation networks concurrently learn base state and contact probabilities alongside policy, enabling better perception of ground interactions.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aaYI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aaYI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 424w, https://substackcdn.com/image/fetch/$s_!aaYI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 848w, https://substackcdn.com/image/fetch/$s_!aaYI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 1272w, https://substackcdn.com/image/fetch/$s_!aaYI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aaYI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png" width="623" height="220.03829787234042" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:249,&quot;width&quot;:705,&quot;resizeWidth&quot;:623,&quot;bytes&quot;:67371,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/186635241?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aaYI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 424w, https://substackcdn.com/image/fetch/$s_!aaYI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 848w, https://substackcdn.com/image/fetch/$s_!aaYI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 1272w, https://substackcdn.com/image/fetch/$s_!aaYI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Figure from <a href="https://www.science.org/doi/10.1126/scirobotics.ade2256">Choi et al (2023)</a> showing state estimation network utilized for locomotion on sand.</figcaption></figure></div><p>The end-result pipeline is quite similar, just potentially decomposing the adaptation module a bit:</p><pre><code><code>HL command &#8594; *History encoder (GPU) &#8594; Estimator (GPU)* &#8594; Actor (GPU) &#8594; Impedance control (CPU) &#8594; Torques</code></code></pre><h4>In-context learning to fix recent mistakes</h4><p>A different method called in-context learning (appearing in <a href="https://covariant.ai/insights/rfm-1-update-in-context-learning-to-improve-grasping/">Covariant.AI&#8217;s Mar 2024 blog post</a>, and in the <a href="https://arxiv.org/abs/2508.02062">RICL method from Aug 2025</a>) attends to recent <em>action history</em> as opposed to encoded observation history. These relevant demonstrations are added to the VLA context before its forward pass.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lCAt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lCAt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 424w, https://substackcdn.com/image/fetch/$s_!lCAt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 848w, https://substackcdn.com/image/fetch/$s_!lCAt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 1272w, https://substackcdn.com/image/fetch/$s_!lCAt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lCAt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png" width="658" height="289.19217081850536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ebd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:494,&quot;width&quot;:1124,&quot;resizeWidth&quot;:658,&quot;bytes&quot;:252599,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/186635241?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lCAt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 424w, https://substackcdn.com/image/fetch/$s_!lCAt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 848w, https://substackcdn.com/image/fetch/$s_!lCAt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 1272w, https://substackcdn.com/image/fetch/$s_!lCAt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">RICL architecture from their <a href="https://arxiv.org/pdf/2508.02062">paper</a>, showing a new </figcaption></figure></div><p>The end-result pipeline adds retrieval buffer of demonstrations before the VLA, and an interpolation unit after the action module:</p><pre><code><code>*Retrieval buffer* &#8594; VLM (GPU) &#8594; Action expert (GPU/CPU) &#8594; *Action interpolation (CPU/GPU)* &#8594; Trajectory tracking (CPU) &#8594; Torques</code></code></pre><p>This method is in a slightly different category, where relevant demonstrations need to occur and be reflected upon to adapt, compared to the potentially faster adaptation enabled by the previous methods. This strategy would not be sensible for time-sensitive or safety-critical tasks, but is categorically different and seemed worth reviewing.</p><h3>Closing thoughts</h3><p>In part 2 of this article series reviewing modern end-to-end robotics pipelines, we discussed why it may be useful to have some adaptation capability for fielded robots to handle unexpected conditions, and some examples of how it can be implemented. We also discussed some historical context from biology and control theory.</p><p>In part 3, we will try to get more hands-on and utilize what we learned from the first two parts to build up an effective pipeline from scratch. I&#8217;m still debating whether to use existing tools such as Isaac sim or build even more from first principles for clarity, so it may take some time before we get there. If you have any suggestions or feedback, let me know in the comments. If you found this article interesting, please share and subscribe for future posts. Thanks for reading!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/is-it-learning-online-motor-adaptation?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/is-it-learning-online-motor-adaptation/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation/comments"><span>Leave a comment</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>It would be interesting to compare the complexity of a model-based vs. neural network implementation of this function (maybe we can try that in part 3).</p></div></div>]]></content:encoded></item><item><title><![CDATA[The architecture behind “end-to-end” robotics pipelines]]></title><description><![CDATA[Part 1: Where the learning stack ends and the control stack begins]]></description><link>https://www.avikde.me/p/the-architecture-behind-end-to-end</link><guid isPermaLink="false">https://www.avikde.me/p/the-architecture-behind-end-to-end</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Mon, 26 Jan 2026 21:19:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!766O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This article is part of a series on end-to-end robotics pipelines:</em></p><ol><li><p>This article</p></li><li><p><a href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation?r=5vzx85">Online motor adaptation</a></p></li><li><p><a href="https://www.avikde.me/p/debugging-as-architecture-insight">Dissecting a VLA</a></p></li><li><p><a href="https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">Closing the action loop with a VLM &#8220;agent&#8221;</a></p></li><li><p><a href="https://www.avikde.me/p/building-a-reasoning-hierarchical">Demo combining the best features of end-to-end and classical approaches</a></p></li></ol><div><hr></div><p>Recent progress and excitement in humanoid robotics are largely driven by rapid gains in generalist capabilities. Historically, most robots were engineered for narrow, well-defined tasks. The current wave of companies, in contrast, is pursuing systems intended to operate across a broad range of activities, shifting both public and economic expectations toward robots that can serve as general-purpose physical agents.</p><p>A central part of this shift is the widespread claim of <em>end-to-end</em> pipelines, often described as going from &#8220;pixels to actions,&#8221; in contrast to earlier approaches built from hand-designed perception, planning, and control modules. This post examines what &#8220;end-to-end&#8221; means in practice: where the pipeline actually begins and ends, the tradeoffs between different architectural choices, and how the algorithms map to computing hardware.</p><p>Part 1 focuses on the &#8220;actions&#8221; side of &#8220;pixels to actions&#8221;: how learned systems interface with the physical control of the robot body. Part 2 will examine how these architectures adapt to environmental uncertainty and contact-rich interaction. Later parts will include hands-on comparisons using small standalone examples to make these differences concrete.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><em>This publication and this post contain the author&#8217;s personal thoughts and opinions only, and do not reflect the views of any companies or institutions.</em></p><h3>Why &#8220;end-to-end&#8221;</h3><p>Classical AI was built up from a strict idea of separation of sensing, planning, and action. To my knowledge, the first robot to embody Sense-Plan-Act was <a href="https://en.wikipedia.org/wiki/Shakey_the_robot">Shakey the robot</a> (~1970), which also employed one of the first <a href="https://en.wikipedia.org/wiki/Stanford_Research_Institute_Problem_Solver">symbolic AI systems</a>. This tiered structure was so formative to robotics research that most research labs today are dedicated to different portions of this hierarchy, such as &#8220;perception&#8221;, &#8220;planning&#8221;, or &#8220;locomotion&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!766O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!766O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!766O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!766O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!766O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!766O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg" width="1200" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;SRI researchers Nils Nilsson (right) and Sven Wahlstrom with Shakey the Robot in the late 1960s.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="SRI researchers Nils Nilsson (right) and Sven Wahlstrom with Shakey the Robot in the late 1960s." title="SRI researchers Nils Nilsson (right) and Sven Wahlstrom with Shakey the Robot in the late 1960s." srcset="https://substackcdn.com/image/fetch/$s_!766O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!766O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!766O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!766O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Shakey the robot in the late 1960&#8217;s (photo from <a href="https://spectrum.ieee.org/sri-shakey-robot-honored-as-ieee-milestone">here</a>).</figcaption></figure></div><p>The sense-plan-act view today is dying a very rapid death. The modern narrative of general-purpose robotics holds that modular pipelines often fail because of limitations imposed by this decoupling; for example, perception errors break planners, planners produce infeasible motions, and most importantly, interfaces encode wrong assumptions.</p><p>As influencial AI researcher <a href="https://sergeylevine.substack.com/p/sporks-of-agi">Sergey Levine puts it</a>,</p><blockquote><p>for any learning-enabled system, any component that is <em>not</em> learned but instead designed by hand will eventually become the bottleneck to its performance</p></blockquote><p>End-to-end training avoids hand-designed intermediate representations, manually tuned cost functions, and any bottlenecks imposed by module interfaces.</p><p>Additionally, &#8220;end-to-end&#8221; sends a sociological signal to do with modern AI foundation-model alignment, scalability with data, and positions the company as an AI lab instead of a controls shop.</p><h3>The action end in practice</h3><p>The practical reality of &#8220;end-to-end&#8221; is more subtle than it might seem. In this section we&#8217;ll review what some published academic and commercial implementations actually appear to be doing today, and also try to outline how the implementation is mapped to computational hardware.</p><h4>The old way: model-based stacks (~2014)</h4><p>It is very common to have a whole-body controller at the low-level, as exemplified by the <a href="https://groups.csail.mit.edu/robotics-center/public_papers/Kuindersma14.pdf">2014 MIT Atlas team&#8217;s report</a>. After a high-level plan is created, a tracking controller is implemented as a quadratic program, and that generates the signals sent to the actuators:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NdCg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NdCg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 424w, https://substackcdn.com/image/fetch/$s_!NdCg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 848w, https://substackcdn.com/image/fetch/$s_!NdCg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 1272w, https://substackcdn.com/image/fetch/$s_!NdCg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NdCg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png" width="580" height="571" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ea62b32-540b-4439-9103-3401ae70d839_580x571.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:571,&quot;width&quot;:580,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58557,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/185869291?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NdCg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 424w, https://substackcdn.com/image/fetch/$s_!NdCg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 848w, https://substackcdn.com/image/fetch/$s_!NdCg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 1272w, https://substackcdn.com/image/fetch/$s_!NdCg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 6 from the <a href="https://groups.csail.mit.edu/robotics-center/public_papers/Kuindersma14.pdf">2014 MIT Atlas team&#8217;s report</a> showing the low-level action pipeline, referred to as &#8220;Control.&#8221;</figcaption></figure></div><p>Mapping to computational hardware:</p><p><em>Trajectory optimizer (CPU) &#8594; WBC inverse dynamics/QP (CPU) &#8594; Joint/servo controllers (microcontroller/CPU) &#8594; Torques</em></p><h4>Learning followed by impedance controller (~2017-2020)</h4><p>To my knowledge, the first fielded robots using learning-based locomotion controllers appeared ~2018 from Google (using <a href="https://www.avikde.me/p/ghost-robotics-minitaur">Minitaur</a>) and in Marco Hutter&#8217;s group. As documented in the <a href="https://arxiv.org/pdf/1804.10332">2018 paper from Google</a> and the <a href="https://arxiv.org/pdf/1901.08652">highly-cited Hwangbo et al (2019) paper</a>, the most effective choice of action space was an impedance controller in turn influenced by <a href="https://arxiv.org/pdf/1611.01055">Peng et al (2017)</a>:</p><blockquote><p>Our experiments suggest that action parameterizations that include basic local feedback, such as PD target angles, MTU activations, or target velocities, can improve policy performance and learning speed across different motions and character morphologies</p></blockquote><p>The policy outputs desired joint positions and sometimes velocity offsets or  gain modulation, and the torque applied is a simple algebraic equation:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tau = K_p(q_{des} - q) + K_d(\\dot{q}_{des} - \\dot{q})&quot;,&quot;id&quot;:&quot;WKLXTSXCPG&quot;}" data-component-name="LatexBlockToDOM"></div><p>The virtue of this architecture is that it is very generic, and succeeds in decoupling the fast time-scales and discontinuities of making and breaking contact from the learning algorithm.</p><p>Mapping to computational hardware:</p><p><em>Policy eval (CPU/embedded GPU) &#8594; Impedance controller (CPU) &#8594; Actuators</em></p><h4>Figure AI&#8217;s &#8220;System 1&#8221; policy (2025)</h4><p><a href="https://www.figure.ai/news/helix">Figure AI&#8217;s Feb 2025 blog post</a> describes a &#8220;System 2 / System 1&#8221; design where a high-level vision-language model (S2) reasons about goals and semantics at low frequency, and a fast visuomotor network (S1) executes continuous control at high frequency. While this reflects a separation of timescales and roles, both modules are trained end-to-end with an abstract latent interface, meaning there is not a principled, physically interpretable handoff between high-level strategy and low-level control. As a result, Helix achieves generalization in perception and task reasoning but does not isolate physical control concerns (such as dynamics stabilization, contact interaction, or actuation abstraction) into structured model-based or classical control modules.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qh5X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qh5X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 424w, https://substackcdn.com/image/fetch/$s_!Qh5X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 848w, https://substackcdn.com/image/fetch/$s_!Qh5X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 1272w, https://substackcdn.com/image/fetch/$s_!Qh5X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qh5X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png" width="1322" height="596" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:596,&quot;width&quot;:1322,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:162629,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/185869291?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Qh5X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 424w, https://substackcdn.com/image/fetch/$s_!Qh5X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 848w, https://substackcdn.com/image/fetch/$s_!Qh5X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 1272w, https://substackcdn.com/image/fetch/$s_!Qh5X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure AI&#8217;s architecture from <a href="https://www.figure.ai/news/helix">their blog post</a>.</figcaption></figure></div><p></p><p>In a <a href="https://www.figure.ai/news/reinforcement-learning-walking">Mar 2025 blog post</a>, they describe what sounds more like the impedance controller above than the system 1 design, so it&#8217;s possible some combination of both architectures is utilized:</p><blockquote><p>We additionally run the policy output through kHz-rate closed-loop torque control to compensate for errors in actuator modeling</p></blockquote><p>Mapping to computational hardware:</p><p><em>System 2 (Transformer, GPU) &#8594; System 1 (Network, GPU) &#8594; [Impedance control (CPU)] &#8594; Torques</em></p><h4>Physical Intelligence&#8217;s action expert (2025)</h4><p>The <a href="https://www.pi.website/research/knowledge_insulation">architecture described</a> is similar to the system 1 above, but specifically suggests that the end-to-end training causes problems:</p><blockquote><p>When adapting a VLM to a VLA in this action expert design, the VLM backbone representations are exposed to the gradients from the action expert. Our experiments show that those gradients from the action expert lead to unfavorable learning dynamics, which not only results in much slower learning, but also causes the VLM backbone to lose some of the knowledge acquired during web-scale pre-training.</p></blockquote><p>This is conceptually analogous to known problems like <a href="https://en.wikipedia.org/wiki/Vanishing_gradient_problem">vanishing/exploding gradients</a> in deep nets, where lower layers dominate or drown out meaningful gradients for higher layers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ll11!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ll11!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 424w, https://substackcdn.com/image/fetch/$s_!Ll11!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 848w, https://substackcdn.com/image/fetch/$s_!Ll11!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 1272w, https://substackcdn.com/image/fetch/$s_!Ll11!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ll11!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png" width="1295" height="595" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:595,&quot;width&quot;:1295,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:68561,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/185869291?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ll11!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 424w, https://substackcdn.com/image/fetch/$s_!Ll11!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 848w, https://substackcdn.com/image/fetch/$s_!Ll11!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 1272w, https://substackcdn.com/image/fetch/$s_!Ll11!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Physical Intelligence&#8217;s architecture from <a href="https://www.pi.website/research/knowledge_insulation">their blog post</a>.</figcaption></figure></div><p></p><p>Another blog post describes issues to do with the mismatched control bandwidth of foundation model output to robot dynamics, solved by <a href="https://www.pi.website/research/real_time_chunking">outputting short horizon trajectories</a> that are played out by a low-level controller.</p><p>Mapping to computation:</p><p><em>VLM (GPU) &#8594; Action expert (GPU/CPU) &#8594; Trajectory tracking (CPU) &#8594; Torques</em></p><h4>Boston Dynamics + TRI&#8217;s pose tracking (2025)</h4><p>Their <a href="https://bostondynamics.com/blog/large-behavior-models-atlas-find-new-footing/">blog post describes</a> an architecture with the higher-level cognitive layer outputs joint positions and end-effector poses. While there isn&#8217;t an explicit decription of how these position setpoints are tracked, the post mentions Atlas&#8217;s MPC, and it is reasonable to assume that that is the lower-level controller.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JBTF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JBTF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 424w, https://substackcdn.com/image/fetch/$s_!JBTF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 848w, https://substackcdn.com/image/fetch/$s_!JBTF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 1272w, https://substackcdn.com/image/fetch/$s_!JBTF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JBTF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png" width="1024" height="372" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:372,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Our policy maps inputs consisting of images, proprioception and language prompts to actions that control the full Atlas robot at 30Hz. We leverage a diffusion transformer together with a flow matching loss to train our model.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Our policy maps inputs consisting of images, proprioception and language prompts to actions that control the full Atlas robot at 30Hz. We leverage a diffusion transformer together with a flow matching loss to train our model." title="Our policy maps inputs consisting of images, proprioception and language prompts to actions that control the full Atlas robot at 30Hz. We leverage a diffusion transformer together with a flow matching loss to train our model." srcset="https://substackcdn.com/image/fetch/$s_!JBTF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 424w, https://substackcdn.com/image/fetch/$s_!JBTF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 848w, https://substackcdn.com/image/fetch/$s_!JBTF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 1272w, https://substackcdn.com/image/fetch/$s_!JBTF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Boston Dynamics + TRI architecture from <a href="https://bostondynamics.com/blog/large-behavior-models-atlas-find-new-footing/">their blog post</a>.</figcaption></figure></div><p>Mapping to computational hardware:</p><p><em>LBM inference (GPU) &#8594; MPC (CPU) &#8594; Actuator torques</em></p><h4>1X&#8217;s inverse dynamics model IDM (2026)</h4><p>1X also describes a hierarchy in <a href="https://www.1x.tech/discover/world-model-self-learning">their blog post</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F2IG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F2IG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 424w, https://substackcdn.com/image/fetch/$s_!F2IG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 848w, https://substackcdn.com/image/fetch/$s_!F2IG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 1272w, https://substackcdn.com/image/fetch/$s_!F2IG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F2IG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png" width="1041" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1041,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:302337,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/185869291?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F2IG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 424w, https://substackcdn.com/image/fetch/$s_!F2IG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 848w, https://substackcdn.com/image/fetch/$s_!F2IG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 1272w, https://substackcdn.com/image/fetch/$s_!F2IG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">1X architecture from <a href="https://www.1x.tech/discover/world-model-self-learning">their blog post</a>.</figcaption></figure></div><p>World Model Backbone (WM): A text-conditioned video prediction model trained on internet-scale video data and fine-tuned on robot sensorimotor data. It predicts future visual states based on current observations and candidate actions.</p><p>Inverse Dynamics Model (IDM): Converts predicted future states into feasible robot action sequences that will produce those outcomes in the real world. The use of the term &#8220;inverse dynamics&#8221; suggests that the output actions are torques, though that isn&#8217;t specified.</p><p>Mapping to computational hardware:</p><p><em>World model (GPU) &#8594; IDM (GPU) &#8594; Actuator torques</em></p><h3>Why not end-to-end</h3><p>From the previous section, it is apparent that &#8220;end-to-end&#8221; doesn&#8217;t usually mean that a single algorithm or network is going from pixels to torques. In this section, we&#8217;ll try to list some potential intuitive reasons for this.</p><h4>Separation of concerns</h4><p>We saw above on Physical Intelligence&#8217;s blog post that there are difficulties in training an end-to-end policy that does so many different things. <a href="https://www.pi.website/research/knowledge_insulation">Another quote</a>:</p><blockquote><p>One hypothesis of why this is happening is the following. A pre-trained VLM, by its nature, pays attention to language inputs well. The gradients from the action expert now severly interfere with the model&#8217;s ability to process language, which leads the model to pick up on other correlations first.</p></blockquote><p>These problems are a side-effect of one network trying to solve a lot of different problems. The old Sense-Plan-Act schema enforced a separation of concerns very strictly, but even with a more relaxed architecture, low-level control priors drastically reduce the policy search space.</p><p>A human nervous exhibits similar separation with a cortex (goal-directed commands), cerebellum (fast adaptation, prediction), spinal reflexes (fast control loops), and even mechanical impedance control in muscles / tendons.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x-HL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x-HL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 424w, https://substackcdn.com/image/fetch/$s_!x-HL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 848w, https://substackcdn.com/image/fetch/$s_!x-HL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!x-HL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x-HL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg" width="900" height="644" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:644,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;This diagram shows the complete pathway a nerve impulse takes when a person tests the temperature of shower water with their hand. First, a sensory nerve ending in the index finger sends a nerve impulse to the spinal cord. A cross section of one segment of the vertebrae is shown from a superior view. The sensory nerve connected to the nerve ending is located in the dorsal root ganglion. The nerve ending is a dendrite of the sensory neuron, as it also has an axon that synapses with an interneuron. The interneuron then synapses with a second interneuron in the thalamus. This second interneuron synapses with brain tissue in the cerebral cortex, allowing conscious perception of the water temperature. The brain then initiates a motor command by stimulating an upper motor neuron in the cerebral cortex. The axon of the upper motor neuron extends all the way to the spinal cord, where it synapses with a lower motor neuron in the gray matter of the spinal cord. The impulse then travels down the lower motor neuron back to the hand where it synapses with the skeletal muscles of the hand. This triggers the muscle contractions that turn the dials of the shower to adjust the water temperature.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="This diagram shows the complete pathway a nerve impulse takes when a person tests the temperature of shower water with their hand. First, a sensory nerve ending in the index finger sends a nerve impulse to the spinal cord. A cross section of one segment of the vertebrae is shown from a superior view. The sensory nerve connected to the nerve ending is located in the dorsal root ganglion. The nerve ending is a dendrite of the sensory neuron, as it also has an axon that synapses with an interneuron. The interneuron then synapses with a second interneuron in the thalamus. This second interneuron synapses with brain tissue in the cerebral cortex, allowing conscious perception of the water temperature. The brain then initiates a motor command by stimulating an upper motor neuron in the cerebral cortex. The axon of the upper motor neuron extends all the way to the spinal cord, where it synapses with a lower motor neuron in the gray matter of the spinal cord. The impulse then travels down the lower motor neuron back to the hand where it synapses with the skeletal muscles of the hand. This triggers the muscle contractions that turn the dials of the shower to adjust the water temperature." title="This diagram shows the complete pathway a nerve impulse takes when a person tests the temperature of shower water with their hand. First, a sensory nerve ending in the index finger sends a nerve impulse to the spinal cord. A cross section of one segment of the vertebrae is shown from a superior view. The sensory nerve connected to the nerve ending is located in the dorsal root ganglion. The nerve ending is a dendrite of the sensory neuron, as it also has an axon that synapses with an interneuron. The interneuron then synapses with a second interneuron in the thalamus. This second interneuron synapses with brain tissue in the cerebral cortex, allowing conscious perception of the water temperature. The brain then initiates a motor command by stimulating an upper motor neuron in the cerebral cortex. The axon of the upper motor neuron extends all the way to the spinal cord, where it synapses with a lower motor neuron in the gray matter of the spinal cord. The impulse then travels down the lower motor neuron back to the hand where it synapses with the skeletal muscles of the hand. This triggers the muscle contractions that turn the dials of the shower to adjust the water temperature." srcset="https://substackcdn.com/image/fetch/$s_!x-HL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 424w, https://substackcdn.com/image/fetch/$s_!x-HL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 848w, https://substackcdn.com/image/fetch/$s_!x-HL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!x-HL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Nervous system components (figure from <a href="https://courses.lumenlearning.com/umd-publichealthbio/chapter/the-function-of-nervous-tissue/">here</a>).</figcaption></figure></div><h4>Training complexity</h4><p>Related to the separation of concerns above, an end-to-end network must learn contact mechanics, actuator dynamics, delays, friction, impact stabilization, as well as task-level planning, all in one gradient signal.</p><p>This creates extremely long credit chains and high sample complexity. Hierarchical control factorizes the learning problem.</p><h4>Feedback control loops; tactile and force feedback</h4><p>With a fully end-to-end system, any feedback on how the executing is going can only come in at the top. In contrast, a dedicated low-level control unit can run its own feedback controller that performs stabilization functions. This is in effect what we saw above with the selection of the impedance controller in the Peng and Hutter papers above.</p><p>Secondly, a low-level controller also provides a great opportunity to incorporate a rich set of sensory signals such as tactile and force feedback information. Rodney Brooks underlines the importance of non-visual feedback in his <a href="https://rodneybrooks.com/why-todays-humanoids-wont-learn-dexterity/">Sep 2025 essay</a>, going as far as to flag it as a roadblock. The problem is, if you must have force feedback in an end-to-end model, you first have to contend with the lack of large-scale force data to train it from, as well as the much larger end-to-end model you now have to train and evaluate at inference-time. As I responded to a Substack comment <a href="https://substack.com/@avikde/note/c-203946866?r=5vzx85&amp;utm_source=notes-share-action&amp;utm_medium=web">here</a>, a low-level control unit is a potential way that that data could be incorporated, without increasing the dimensionality of the higher-level brain.</p><h4>Control bandwidth</h4><p>Real-world physics and dynamics don&#8217;t wait for end-to-end inference to complete, and most implementations (Physical Intelligence&#8217;s action chunking, Figure&#8217;s rate-decoupled system 1, etc.) need to decouple the control bandwidth of the cognitive layer from the low-level controller.</p><p><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Chris Paxton&quot;,&quot;id&quot;:232680664,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!13Dp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a886fd-347d-4694-b670-0253975d2ba9_659x547.png&quot;,&quot;uuid&quot;:&quot;e9159de1-8107-495d-825e-bdb80a0bb838&quot;}" data-component-name="MentionToDOM"></span> talks about this aspect as an action inference limitation in his excellent <a href="https://itcanthink.substack.com/p/vision-language-action-models-and">post about VLA&#8217;s</a> which you should read if you haven&#8217;t.</p><h4>Sim2real transfer</h4><p>As discussed in my recent <a href="https://www.avikde.me/p/the-ai-world-models-debate-and-its">world models post</a>, almost all these implementations that utilize large-scale demonstration data need to follow it up with reinforcement learning post-training in simulation. This surfaces an issue that has been named &#8220;sim2real transfer,&#8221; where the simulator&#8217;s accuracy can limit the deployed behavior. This has a number of solutions including domain randomization and actuator networks, but alternatively, having a low-level controller can in many cases absorb modeling error with their inverse dynamics functionality. Physics errors affect torque-level policies massively, but impedance control, whole-body control, or model-predictive control absorb modeling error by actively driving mismatch errors to zero.</p><h4>Safety constraints</h4><p>We can explicitly add torque constraints, joint kinematic limits, self-collision avoidance, to a low-level controller. This is intuitively true, but I&#8217;ll leave an example of a <a href="https://umi-ft.github.io/">recent research paper</a> which found out exactly this. Quoting the author:</p><blockquote><p>Introducing UMI-FT: the UMI gripper equipped with force/torque sensors (CoinFT) on each finger. Multimodal data from UMI-FT, combined with diffusion policy and compliance control, enables robots to apply sufficient yet safe force for task completion. </p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1u9p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1u9p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 424w, https://substackcdn.com/image/fetch/$s_!1u9p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 848w, https://substackcdn.com/image/fetch/$s_!1u9p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 1272w, https://substackcdn.com/image/fetch/$s_!1u9p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1u9p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png" width="633" height="271" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:271,&quot;width&quot;:633,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1u9p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 424w, https://substackcdn.com/image/fetch/$s_!1u9p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 848w, https://substackcdn.com/image/fetch/$s_!1u9p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 1272w, https://substackcdn.com/image/fetch/$s_!1u9p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">UMI-FT <a href="https://umi-ft.github.io/">research paper</a> architecture with explicity safety constraints in lower-level controllers.</figcaption></figure></div><h4>Generalization across hardware embodiment (*maybe)</h4><p>In principle, if the low-level controller completely abstracts the hardware, the higher-level brain&#8217;s functionality can be kept the same with different embodiments. Intuitively, you can reuse high-level policies if low-level layers abstract hardware, and you can improve low-level stability without retraining ML.</p><p>However, this intuitive point is difficult to verify due to the methodology of how the cognitive models are developed today. The end-to-end pixel &#8594; action policies always incorporate some amount of information about the embodiment, so it isn&#8217;t possible to train an abstract cognitive model. In practice, the foundation models of today train on <a href="https://www.pi.website/blog/pi0">cross-embodiment</a> data to obtain generalizable knowledge. To get to the bottom of this facet, we would need to understand what constitutes a cognitive model separate from embodiment, and that is not known yet as discussed in my previous world models post:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;651f537e-8ea4-4af0-ab46-6866064a066c&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The AI world models debate and its foreshadowing on robotics&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Writing about safe, efficient AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-14T08:18:52.656Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Xry4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/the-ai-world-models-debate-and-its&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:184309659,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:4,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><h3>Closing thoughts</h3><p>With the end of Sense-Plan-Act, the new robotics north star is an end-to-end pipeline that does away with the need for any task-specific pipeline architecture or programming. However, today&#8217;s successful implementations tell a different story, and there are a number of intuitive reasons for this.</p><p>Foundation models excel at semantic, perceptual, and strategic reasoning, but they are mismatched to high-bandwidth, stability-critical motor control. A robust robotic architecture separates concerns into layers aligned with physical timescales and modeling regimes.</p><p>In this (part 1) article, we focused on standard visuomotor task execution. In part 2 of this series, we&#8217;ll look at how unexpected events and motor adaptation are handled in these architectures. After that, to continue this series, I&#8217;d also like to explore a standalone demonstration that can be published as an open-source repo that examines a few of these architectures and compares them fairly.</p><p>If you found this post interesting, please let me know in the comments, and share, and subscribe. Thanks for reading!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/the-architecture-behind-end-to-end/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/the-architecture-behind-end-to-end/comments"><span>Leave a comment</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/the-architecture-behind-end-to-end?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/the-architecture-behind-end-to-end?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[What von Neumann understood about the architecture of intelligence before we built AI]]></title><description><![CDATA[The Computer and the Brain anticipated both the successes and shortcomings of deep learning AI 70 years ago]]></description><link>https://www.avikde.me/p/what-von-neumann-understood-about</link><guid isPermaLink="false">https://www.avikde.me/p/what-von-neumann-understood-about</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Mon, 19 Jan 2026 19:17:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_hYZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>My weekend read was &#8220;The Computer and the Brain&#8221;, an out-of-print book I picked up at the Strand Bookstore last year. John von Neumann wrote most of the contents in 1955 to prepare material for the Silliman lectures in 1956&#8212;an obligation that clearly meant a lot to him. He was diagnosed with bone cancer that year, but continued writing his notes in the hopes of being able to deliver them in some form. Tragically, he was never able to deliver the lectures, but his wife was able to collect and publish the partial manuscripts prefaced by a <a href="https://mathshistory.st-andrews.ac.uk/Extras/Von_Neumann_Silliman/">heart-wrenching letter</a>, and they would become his last words on these topics.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_hYZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_hYZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_hYZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_hYZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_hYZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_hYZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg" width="600" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:81581,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/185086427?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_hYZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_hYZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_hYZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_hYZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I&#8217;ve known of von Neumann&#8217;s huge legacy on modern computing from a college computer organization course, but I was stunned at how much he was able to extrapolate into ideas about computation in general. His writings, from before the first transistor-based computer was built, are ever-relevant after 70 years of exponential growth in computing technology. He wasn&#8217;t correct about everything&#8212;that would be impossible&#8212;but the ways in which he was wrong are even more revealing and thought-provoking. They anticipate the reason deep learning has been so capable, and also predict the architectural limits we are now running into&#8212;memory bottlenecks, brute-force scale, and energy-hungry intelligence. They also anticipate the future directions we can go in to overcome these deficiencies.</p><p>The book is very short and absolutely worth a read if you can pick it up from a library or used bookstore, but I had four broad and powerful takeaways that contextualized decades of development for me.</p><ol><li><p><strong>Scale &amp; memory:</strong>  Basic operations force massive memory movement</p></li><li><p><strong>Precise vs. statistical:</strong>  Deep learning (DL) escapes numerical fragility by becoming brain-like </p></li><li><p><strong>Depth vs. architecture:</strong>  DL substitutes scale for structural sophistication in the brain</p></li><li><p><strong>Representation &amp; substrate:</strong>  DL is rigid where the brain is fluid</p></li></ol><p>I&#8217;ll explain these four aspects below, but together they point to the same overall thesis:</p><p>Modern AI succeeded by replicating the statistical aspect of natural computation, but suffers from brute-force scaling inside an architecture that von Neumann already suspected was fundamentally mismatched to cognition.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>1. Scale &amp; memory</h3><p>As the book says, the principle of &#8220;one organ for each basic operation&#8221; necessitates memory for intermediate values, on top of instruction and data memory. Von Neumann predicted that computation systems built from simple primitives can only scale by also scaling memory.</p><p>Scalar CPU architecture is still very close to von Neumann&#8217;s artificial automaton. Post-von-Neumann architectures include systolic arrays (TPUs) and near-memory compute; GPUs are a bit of a hybrid with shared memory (scratchpads), and tiled matrix multiply (data reuse). Even heavily optimized post-von-Neumann machines are still dominated by data movement, because the algorithmic structure forces it.</p><p>Modern deep learning vindicates this: intelligence is achieved not through complex operations but through scale, which makes memory movement, not computation, the central bottleneck of contemporary hardware. We have been talking about the <a href="https://ieeexplore.ieee.org/document/10477550">AI memory wall</a> for a few years, but it was inevitable from these predictions 70 years ago.</p><p>A related aspect which von Neumann couldn&#8217;t have anticipated was the energetic impact of memory access. He did write about the energetic cost of logic operations, but today, moving 1 bit from DRAM costs more energy than 1 FLOP, to the tune of 100&#215; a multiply. This pressure is driving technology development in near-memory compute, in-memory analog MACs, optical interconnects. The same architectural tension von Neumann identified today drives the economics<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> of AI hardware.</p><h3>2. Precise vs. statistical</h3><p>One of the central topics in the book is how digital computation needs very high precision because of the high arithmetic depth of repeated basic operations. If each operation has error &#949;, after N steps you expect error &#8776; O(N&#949;). Deep networks have <em>extreme arithmetic depth</em> with thousands of layers and trillions of operations. However, empirically, 4-bit quantization in deep learning <a href="https://developer.nvidia.com/blog/optimizing-llms-for-performance-and-accuracy-with-post-training-quantization/">works with nearly no drop in accuracy</a>.</p><p>Why doesn&#8217;t error compound the way von Neumann predicted?</p><p>The key: von Neumann analyzed precise numerical methods (like solving equations, integrating trajectories), but neural networks are different in a couple of important ways:</p><ol><li><p>Noise is inherent in the training process, resulting in a function approximator with inherent robustness to input noise.</p></li><li><p>In the accumulation function, the errors are mixed across thousands of dimensions, clipped by nonlinear saturating functions, and averaged out statistically.</p></li></ol><p>The overall error is clamped and damped, and does not propagate in the same way that von Neumann assumed.</p><p>Relatedly, von Neumann argued that the brain works with low precision (1-10 bits), and performs a different type of computation than digital computers (32-64 bits). He referred to the brain as performing &#8220;statistical computing&#8221;. So deep learning is not violating von Neumann, but it is <strong>occupying the biological side of his dichotomy</strong>.</p><h3>3. Depth vs. architecture</h3><p>Von Neumann emphasizes three biological facts about neurons:</p><ol><li><p><strong>Low precision</strong></p></li><li><p><strong>Low speed</strong> (~10 ms per spike, though they can respond slightly faster under extreme stimulation)</p></li><li><p><strong>Shallow circuits</strong></p></li></ol><p>We discussed the precision above; let&#8217;s dig into the others next. The nervous system is very slow, with each &#8220;layer&#8221; taking on the order of 10 ms to fire and reset (compared to digital lines changing state in &lt; 1 ns). This means that while it is feasible to have a &#8220;deep&#8221; digital computation, that would be infeasible in a natural system.</p><p>The shallowness is also important: a crucial example in the book is that the retina does significant computation using three synapse layers, which is orders of magnitude smaller than is needed for <a href="https://towardsdatascience.com/image-classification-with-vision-transformer-8bfde8e541d4">modern Vision Transformer (ViT) encoders</a> (hundreds of layers, billions of parameters).</p><p>How is this possible? The answer is that a biological neuron is not a basic linear unit + nonlinearity; it is more like a <strong>small analog computer</strong>. Each neuron has temporal dynamics, neuromodulators, and plasticity rules. Its connections are even more complex: each has hundreds of synapses, nonlinear integration of activations with potential spatial and geometric relations.</p><p>So the contrast is stark:</p><ul><li><p>The brain has <em>shallow</em> compositions of <em>slow</em> and <em>low-precision</em> units</p></li><li><p>Deep nets have <em>very deep</em> compositions of <em>very fast</em> and <em>medium-low</em>-precision units</p></li></ul><p>Von Neumann predicted that these fundamental differences in the basic blocks would result in different natural vs. artificial computing paradigms:</p><blockquote><p>Hence the logical approach and structure in natural automata may be expected to differ widely from those in artificial automata.</p></blockquote><p>Modern deep learning compensates for architectural simplicity with scale. Biology compensates for slow, noisy hardware with architectural sophistication and better primitives. This distinction strongly influences why our systems are large, power-hungry, data-hungry, and memory bound.</p><p>I hadn&#8217;t anticipated this connection when I started reading the book, but my article from last week also visits this architectural distinction from a world-model-representation perspective:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;c01c2482-9885-4fc0-b992-e9590bd3f4eb&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The AI world models debate and its foreshadowing on robotics&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Writing about safe, efficient AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-14T08:18:52.656Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Xry4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/the-ai-world-models-debate-and-its&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:184309659,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:2,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Why did we choose scaling of simple units for computing? Among other reasons (as discussed in the previous article), deep learning was built around universality + scalability, not biological realism. Simple units have advantages: easy to parallelize, easy to implement on GPUs, and easy to map to silicon.</p><p>What does this mean for general artificial intelligence? Von Neumann suspected that digital logic gates were too primitive to model cognition efficiently, and with today&#8217;s technology it is certainly true that the brain&#8217;s performance at 10W cannot be matched even at much higher power.</p><h3>4. Representation &amp; substrate</h3><p>Von Neumann observes that representations of quantities which go through the nervous system may change from digital to analog and vice versa repeatedly. They can also have adaptive precision representations<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. In contrast, digital machines commit very early to fixed-width numbers everywhere (FP32, FP16, INT8, etc.), and even &#8220;mixed precision&#8221; is coarse and static.</p><p>These points seem to suggest another architectural dichotomy (not just the connections between units, but also in how numerical quantities are represented). The brain has <em>adaptive </em>primitives, precisions, and numerical representations, whereas they are all <em>fixed</em> in the digital computing paradigm.</p><p>Is the answer analog computing? Von Neumann himself rejected naive analog computing due to its problems of scalability and reliability. The brain may be powerful while being efficient because it is <em>representationally flexible</em>, not because it is analog per se.</p><p>Neuromorphic computing is exactly about this axis, with conceptual departures such as event-driven computation, mixed analog/digital circuits, co-located computation and memory. My knowledge of the field is limited and I am not sure that any of the existing research in that area truly captures what von Neumann was hinting at, but I suspect that in the long-term future of this publication, neuromorphic computing will come up again.</p><div><hr></div><p>Thanks for reading! Let me know if you&#8217;d suggest any related historical or modern writing on this topic, and please share and subscribe if you liked the essay.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/what-von-neumann-understood-about?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/what-von-neumann-understood-about?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>See, for example, <a href="https://groq.com/newsroom/groq-and-nvidia-enter-non-exclusive-inference-technology-licensing-agreement-to-accelerate-ai-inference-at-global-scale">Groq-NVIDIA</a> deal, <a href="https://www.tomshardware.com/pc-components/ram/data-centers-will-consume-70-percent-of-memory-chips-made-in-2026-supply-shortfall-will-cause-the-chip-shortage-to-spread-to-other-segments">DRAM shortages</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>A nice example of this is the &#8220;average pulse frequency&#8221; interpretation of a sequence of quasiperiodic pulses. Coarse spike counts suffice for rough decisions, and temporal averaging increases accuracy automatically.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[The AI world models debate and its foreshadowing on robotics]]></title><description><![CDATA[Plus, five facets of comparison for the two approaches]]></description><link>https://www.avikde.me/p/the-ai-world-models-debate-and-its</link><guid isPermaLink="false">https://www.avikde.me/p/the-ai-world-models-debate-and-its</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Wed, 14 Jan 2026 08:18:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Xry4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Large language model (LLM)-based tools such as chatbots, coding assistants, and writing aids have become widely adopted and have had significant cultural and economic impact and utility. At the same time, the conversation continues about what kinds of progress these models represent and what their limitations may be. One of the central questions in this discussion is whether &#8220;scaling&#8221; improvements in LLMs (primarily achieved through larger models and larger training datasets) can lead to general intelligence, or whether additional architectural or conceptual advances will be required.</p><p>In parallel with these debates, especially on the heels of numerous announcements at CES 2026, the cultural focus is increasingly driving toward robotics or &#8220;physical AI&#8221;; is there a physical equivalent to this intellectual debate between scaling and structured models?</p><p>Here, we&#8217;ll try to go over some of the key aspects of this intellectual and conceptual spectrum starting with the informational world, and examine the implications of the equivalent schools of thought in the physical world.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><em>This publication and this post contain the author&#8217;s personal thoughts and opinions only, and do not reflect the views of any companies or institutions.</em></p><h2>Today&#8217;s AI is a product of scaling a simple architecture (mostly)</h2><p>Breaking down this heading, by &#8220;today&#8217;s AI,&#8221; I&#8217;m referring to the most pervasive products, such as chatbots, search, coding and writing assistants. These systems are typically based on large transformer architectures composed of many repeated layers and trained on vast datasets, with models today having hunders of billions of parameters.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> In simplified form, these systems operate by mapping input tokens into embeddings, processing them through a stack of transformer blocks, and producing probability distributions over possible next tokens via a final linear projection and softmax layer.</p><p>Since the initial release of ChatGPT, the dominant trend in the development of these models has been to increase their size and the amount of data used for training, rather than to introduce fundamentally new architectural principles.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GOex!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GOex!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GOex!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GOex!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GOex!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GOex!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg" width="594" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:594,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Transformer model size over time&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Transformer model size over time" title="Transformer model size over time" srcset="https://substackcdn.com/image/fetch/$s_!GOex!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GOex!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GOex!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GOex!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure from <a href="https://blogs.nvidia.com/blog/what-is-a-transformer-model/">NVIDIA</a> about LLM scaling</figcaption></figure></div><p>Given this architectural simplicity, the range of capability expressed by LLM-based tools is frankly impressive. Much of this capability therefore arises from the interaction between large model size and extensive training data, rather than from task-specific design and bespoke computational structures.</p><p><a href="https://blog.samaltman.com/three-observations">Sam Altman&#8217;s early 2025 blog post</a> and the empirical observations of companies building LLMs added on evidence and expectation of continued scaling of intelligence this way. These observations led to a &#8220;scale is all you need&#8221; movement that has had enormous impact on our society and economy, with <a href="https://www.mckinsey.com/industries/private-capital/our-insights/scaling-bigger-faster-cheaper-data-centers-with-smarter-designs">$1.7 trillion of projected investment by 2030</a>.</p><p>The larger debate we&#8217;re looking at in this post is about the prediction that scale is <em>sufficient </em>(more below), but it is also important to ask if it is <em>necessary</em>. I.e. <strong>is scale </strong><em><strong>required</strong></em><strong> to exhibit the same progress?</strong> The answer to this is likely yes; as stated in a <a href="https://osf.io/preprints/psyarxiv/c5gh8_v1">Dec 2025 preprint by Quattrociocchi et al</a>, when the models are restricted to the transformer architecture described above, it appears to be true that &#8220;their apparent intelligence emerges only under conditions of massive scale&#8221;.</p><p>Another natural question is <strong>why there has been so much investment into exploiting scaling</strong>, vs. exploration of other architectures. The first is that progress is consistent and predictable (even suggesting scaling &#8220;laws&#8221; as in the Altman blog post) which enable predictable engineering and financial projections. Innovation and development of new architectures is a relatively unpredictable and risky process. Another very prominent virtue is that simple architectures are much easier for collaboration with other parts of the engineering stack, and has been key for the <a href="https://chipinsights.net/p/the-alphabet-soup-of-processors">adoption</a> of hardware acceleration for deep learning.</p><p>Many leading researchers such as Demis Hassabis, Geoffrey Hinton, and teams at OpenAI and Anthropic maintain that scaling remains a primary driver of progress.</p><h2>The other side of the AI debate</h2><p>Over the recent past, there have been an increasing number of arguments disagreeing with the claim that scaling is sufficient to get to arbitrary &#8220;intelligence.&#8221;</p><p>Per the March 2025 <a href="https://www.nature.com/articles/d41586-025-00649-4">findings</a> of the annual meeting of the AAAI, including responses from more than 475 members (67% of them academics),</p><blockquote><p>More than three-quarters of respondents said that <a href="https://www.nature.com/articles/d41586-023-00641-w">enlarging current AI systems &#8213; an approach that has been hugely successful</a> in enhancing their performance over the past few years &#8213; is unlikely to lead to what is known as artificial general intelligence (AGI).</p></blockquote><p>Well-respected AI researchers are starting to form the next wave of AI companies that try to encode some kind of &#8220;world model&#8221; or semantic understanding of the world: Dr. Fei-Fei Li&#8217;s World Labs generates images and videos but<a href="https://spectrum.ieee.org/fei-fei-li-world-labs"> only via an intermediating representation of a 3D world</a>. Yann LeCun&#8217;s new startup <a href="https://techcrunch.com/2025/12/19/yann-lecun-confirms-his-new-world-model-startup-reportedly-seeks-5b-valuation/">AMI labs is likely also building world models</a> via some form of his published JEPA work. Ilya Sutskever (one of OpenAI&#8217;s founders, who had a large contribution to Sam Altman&#8217;s perspective above) <a href="https://www.dwarkesh.com/p/ilya-sutskever-2">went on Dwarkesh&#8217;s podcast</a> and said that scaling alone would not carry us to AGI and that &#8220;something crucial is missing.&#8221; Cognitive scientist Gary Marcus has <a href="https://garymarcus.substack.com/">frequently writes</a> about the need for symbolic reasoning for AI and is often in the thick of the debate on how to get there.</p><h3>What is a world model?</h3><p>There is at present no clearly-victorious architecture for how to encode added structure in large AI models. Consider a few examples from the AI world:</p><ul><li><p><a href="https://www.worldlabs.ai/">World Labs</a>, whose product generates consistent images and video, would define it as metric information about a 3D scene</p></li><li><p>Many AI researchers using a working definition for a world model as a <a href="https://itcanthink.substack.com/p/what-are-robot-world-models">(potentially latent-space) dynamical model that predicts how the state of the world evolves under actions</a>.</p><ul><li><p>Schmidhuber wrote a <a href="https://arxiv.org/pdf/1803.10122">paper about world models in 1991</a>, with the working definition as &#8220;predicting future sensory data given our current motor actions&#8221;</p></li><li><p>Yann LeCun proposes learning and predicting latent-space dynamics in his JEPA research (papers 2022-2025) &#8212; crucially, the projection to latent space is also learned from data, making it more general but less grounded in physical laws</p></li><li><p>The 1x world model is <a href="https://www.1x.tech/discover/world-model-self-learning">described in Jan 2026</a> as having latent space prediction capability and used to generated predicted future video states</p></li><li><p><a href="https://arxiv.org/pdf/2506.01622">DeepMind&#8217;s 2025 paper</a> also seeks a &#8221;predictive model of its environment&#8221; &#8212; In the paper it is a markov process, but for a continuous system such as a robot, it would be continuous or discretized dynamics governed by physics. It does not, however, specify how one would design architectures to take advantage of world models: &#8220;Future work should explore developing scalable algorithms for eliciting these world models and using them to improve agent safety.&#8221;</p></li></ul></li></ul><p>Zooming out to broader science, models have been developed and used in almost all fields; biologists have been <a href="https://openlibrary.org/books/OL2049287M/The_organization_of_learning">discovering models for navigation</a> in animal brains, physicists have been developing models for the behavior of the universe from quantum to astronomical scales for centuries, civil engineers have been using models of mechanics to build our houses and bridges, etc. Gary Marcus <a href="https://garymarcus.substack.com/p/generative-ais-crippling-and-widespread">defines</a> a cognitive world model as &#8220;a computational framework that a system (a machine, or a person or other animal) uses to track what is happening in the world &#8230; persistent, stable, updatable (and ideally up-to-date) internal representations of some set of entities within some slice of the world.&#8221; Each of these parties would likely have different opinions on models of the world / universe that AI should be imbued with.</p><p>In this post, we&#8217;ll stay focused on whether the added structure is important, but not discuss the relative merits of these varied proposals. (That is a potential topic for future posts; make sure to subscribe to get notified when they get published)</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h3>Why do we need world models?</h3><p>The critical view is that while LLMs are designed to predict what to do next, but are not designed to build an underlying semantic understanding, and that there are many examples of errors (or &#8220;hallucinations&#8221;) that can ultimately be root-caused to this:</p><p>LLMs can <a href="https://garymarcus.substack.com/p/generative-ais-crippling-and-widespread">parrot rules of chess but will make illegal moves</a> at the same time, they do not generalize well to <a href="https://saanyaojha.substack.com/p/the-man-who-cant-be-moved">out-of-training scenarios or under uncertainty</a> and can produce unpredictable responses to uncommon inputs such as <a href="https://www.plough.com/en/topics/life/technology/computers-cant-do-math">SolidGoldMagikarp</a>, they exhibit &#8220;<a href="https://arxiv.org/pdf/2408.06518v3">semantic leakage</a>&#8221; of concepts and semantics in their input streams, with <a href="https://www.fox13now.com/news/local-news/summit-county/how-utah-police-departments-are-using-ai-to-keep-streets-safer">real-world impacts on usage of AI for policing</a>. While capabilities of LLMs do keep increasing, there is concern that errors such as these cannot be universally eradicated without an architectural shift.</p><h2>Is there an equivalent debate in robotics?</h2><p>Humanoid robotics in particular has been having a prominent rise into the <a href="https://www.cnbc.com/2026/01/09/humanoid-robots-take-over-las-vegas-at-ces-tech-touts-future-of-ai.html">cultural</a> and <a href="https://techcrunch.com/2025/09/16/figure-reaches-39b-valuation-in-latest-funding-round/">economic</a> consciousness in the last few years. Humanoids have been featured at <a href="https://www.nvidia.com/en-us/on-demand/session/gtc24-s62542/">NVIDIA keynotes for about two years</a> now, clearly signaling that the time is here for robotics companies to show their products and get mass-market adoption. While the field of robotics has existed for a long time, it is undeniable that the capabilities demonstrated have been seeing large improvements along with this increased exposure to the public eye.</p><p>Does the same architectural divide we just discussed for LLMs also exist in robotics? Less is known (much less agreed upon) about the best way to develop advanced capabilities in these robots, but we can use public information from some companies that have made product announcements to guess some patterns:</p><ul><li><p>The Boston Dynamics CEO <a href="https://www.businessinsider.com/huamnoid-robots-manufacturing-deployment-timeline-robert-playter-ceo-interview-2026-1">says</a> that they &#8220;need to be able to bring a new task to bear in a day or two &#8230; because, I think in a factory, there&#8217;s literally hundreds of tasks and the tasks evolve,&#8221; and their <a href="https://www.cbsnews.com/news/boston-dynamics-ai-powered-humanoid-robot-learning-factory-work-60-minutes-transcript/">60 minutes feature</a> shows the ability to rapidly deploy motion capture or VR demonstration data to their Atlas robot</p></li><li><p>Figure describes its &#8220;<a href="https://www.figure.ai/news/project-go-big">Project Go-Big</a>&#8221; as an effort to collect human demonstration data in the form of first-person video for pre-training<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> a navigation model</p></li><li><p>1x <a href="https://www.wsj.com/tech/personal-tech/i-tried-the-robot-thats-coming-to-live-with-you-its-still-part-human-68515d44">described</a> its plan to collect teleoperated demonstration data with its robot in people&#8217;s homes for continued training of its AI model in Oct 2025, and released an <a href="https://techcrunch.com/2026/01/13/neo-humanoid-maker-1x-releases-world-model-to-help-bots-learn-what-they-see/">update in Jan 2026</a> suggesting learning from internet-scale videos as demonstration followed by RL in simulation</p></li></ul><p>I want to note that all these companies have very intelligent researchers and engineers on their staff, and it is very possible (and likely) that there is more going on in these particular demos; I only include these specific reference points as context to pick out broad themes. Some surfacing patterns are that (a) the rate at which different tasks are demonstrated is a high priority for these companies, (b) many of them are looking to pre-training with motion data collected from humans, and (c) this will be followed by post-training using reinforcement learning (most likely in simulation) where the system&#8217;s reward will include matching the demonstration.</p><p>My rough summary here is largely echoed by Rodney Brooks in his <a href="https://rodneybrooks.com/why-todays-humanoids-wont-learn-dexterity/">2025 post on humanoid robot dexterity</a>:</p><blockquote><p>How the humanoid companies and academic researchers have chosen to do this is largely through having a learning system watch movies of people doing manipulation tasks, and try to learn what the motions are for a robot to do the same tasks. In a few cases humans teleoperate a robot, that they can see, along with the objects being manipulated ...</p></blockquote><h3>A robotics parallel of LLM development</h3><p>Very roughly, the training process for both status-quo approaches have similar-looking steps:</p><ul><li><p><strong>pre-training</strong> - reading internet-scale text (LLMs), vs. watching internet-scale human demonstration video or motion data (robots);</p></li><li><p><strong>post-training</strong> - RLHF and its modern equivalents vs. RL in simulation followed by sim-to-real porting and deployment</p></li></ul><p>With this grounding, we can ask <strong>whether robotics applications will run into the same problems and debates</strong> as we discussed for LLMs above.</p><p>One unknown is whether motion data is the best analogue of text data. Rodney Brooks articulates some concerns about this in his dexterous manipulation essay, suggesting that tactile sensing data is needed (but internet-scale tactile sensing data, or any other kind of robot data, doesn&#8217;t exist). It is likely that all the robots will <a href="https://www.figure.ai/news/introducing-figure-03">include tactile sensors</a> in some form, but it isn&#8217;t clear yet how they will fit into this human demonstration large-data paradigm. </p><p>The larger question is whether a navigation capability trained with motion data will generalize to unseen and unexpected situations, since it is not designed to encode an explicit understanding of &#8220;objects&#8221; or &#8220;inertias&#8221; or &#8220;positions&#8221;. This concern exactly mirrors the ones about semantic understanding in LLMs. It is likely that the rate of this class of error will go down with larger models trained with more data (effectively, the scaling argument). To accomplish that goal, the &#8220;<a href="https://itcanthink.substack.com/p/how-can-we-get-enough-data-to-train">robot data gap</a>&#8221; will need to be closed, which will take a lot of compute for data generation and training larger models due to the large dimensionality of the sensory and action spaces in robotics.</p><p>It is also relatively more difficult to &#8220;scale-up&#8221; in robotics for several reasons. First, latency and real-time reaction is much more important than in a chatbot setting, and so increasing model size at the cost of latency is not viable. In <a href="https://www.figure.ai/news/helix">Figure&#8217;s Feb 2025 blog post</a>, we can see that a 7B parameter VLM is used, at a time when when much larger (and presumably more accurate) models were available, and 1x states that <a href="https://www.1x.tech/discover/world-model-self-learning">11 seconds of thinking are required for 5 second tasks</a>. Second, as Chris Paxton has written about <a href="https://itcanthink.substack.com/p/how-can-we-get-enough-data-to-train">many</a> <a href="https://itcanthink.substack.com/p/what-are-the-data-scaling-laws-for">times</a>, getting diverse and useful data to feed a larger model has a lot of challenges. Third, robots need to carry their own battery packs, and so adding a larger GPU to run larger models introduces runtime and thermal management concerns.</p><p>On the other hand, the architecture (albeit with many details glossed over) seems to be consistent across many tasks and does not require too many architectural decisions to be made or parameters to be tuned (except for training metaparameters). It also allows for a myriad of types of demonstrations to be stood up quickly for garnering buy-in and support from customers or investors, which is a significant benefit. This is understandably a parallel of some of the observations that led to the scaling-based improvements of LLMs.</p><h3>World models in robotics</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xry4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xry4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Xry4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Xry4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Xry4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xry4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png" width="480" height="320.1098901098901" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:480,&quot;bytes&quot;:629810,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/184309659?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Xry4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Xry4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Xry4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Xry4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Generated by ChatGPT</figcaption></figure></div><p>First, we must observe that the post-training process described above will typically use simulation environments (with simulated physics) for the training process. Despite having the appearance of being model free, the properties of the simulator (which itself uses physics models) are implicitly embedded into the learned policy.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> The 2025 DeepMind paper referenced above suggests that it may be possible to prove that this implicitly captured information can be used to extract an explicit physics model after training.</p><p>So, does that mean we should put world models out of mind and learn an implicit one as needed (or not)? Well, this is a very inefficient way to learn physics equations and parameters: Euler-Lagrange equations, and classical <a href="https://en.wikipedia.org/wiki/System_identification">system ID</a> or <a href="https://en.wikipedia.org/wiki/Adaptive_control">adaptive control</a> methods may be able to capture the same model much more easily and in a way that is more easily generalizable. RL in general can require a large amount of training for the results they produce because <a href="https://itcanthink.substack.com/p/the-limits-of-reinforcement-learning">rewards are typically sparse</a>. In other words, an RL-trained policy possesses &#8220;<a href="https://garymarcus.substack.com/p/generative-ais-crippling-and-widespread">a lot of knowledge, and in some ways far more than most, if not all, humans</a>,&#8221; to have human-level performance on specific task. Of course, humans rely on enormous evolutionary and developmental pretraining, encoding which into specialized structures is exactly the pro-world-models argument in the debate.</p><p>In terms of the models themselves, for tasks like locomotion, Newtonian physics is very well understood, and roboticists have been building on it to <a href="https://inria.hal.science/hal-02487855/file/Chapter.pdf">develop and use models like ZMP, LIP for decades</a>. For more abstract control systems, the concept of a &#8220;plant model&#8221; in control theory is not dissimilar to the abstract state-prediction models referred to in the section above.</p><p>Some classical methods to utilize these models are to use trajectory optimization subject to the model, model-predictive control, etc. These can impose constraints on future states, so that within the bounds of the model&#8217;s accuracy, some aspects of safety can be encoded in way that isn&#8217;t possible otherwise.</p><h2>How to compare the two approaches</h2><p>Now that we can recognize the &#8220;world model&#8221; debate in applications for informational and physical AI, it&#8217;s helpful to (in rough, broad strokes) know how to compare the two strategies from a number of perspectives:</p><ol><li><p><strong>Performance: </strong>Can the method produce results that are compelling? There are umpteen benchmarks to compare language models. The next generation of world-model-equipped LLMs aren&#8217;t here yet, so we&#8217;ll wait to wait a little while to see how they stack up. There aren&#8217;t robotics benchmarks of the sort yet, though some <a href="https://generalrobots.substack.com/p/benjies-humanoid-olympic-games">informal efforts are underway</a>.</p></li><li><p><strong>Scalability and time-to-market: </strong>This is a huge advantage of scaling a simple architectures. Deep neural networks with consistently-repeating matrix multiplication and reduction primitives have been able to be mapped to SIMT processors like GPUs and systolic array processors (NPUs, TPUs) with incredible performance gains. At the moment there is not even enough information about non-trivial architectures to consider mapping them to computational hardware. It is also possible that world models can be mapped into the existing computational frameworks (and we can assume that the first generation of them will have to do so to compete). Eventually, if the computations are quite different, modified paradigms and accelerators may be needed, and scaling those may require more care and thought than the straightforward process we have followed for scaling LLMs. Based on the current state of language models and humanoid robotics as recapped above, it is clearly easier to get initial proofs-of-concept working with model-free approaches scaling a simple architecture.</p></li><li><p><strong>Computational efficiency: </strong>Newton&#8217;s equations descibe motion of bodies with very few parameters in great generality, and it is impractical to capture them with a &#8220;transformer-like&#8221; structure without significantly higher number of parameters. This is especially true where equations are discontinuous, which happens in robotics problems like locomotion and manipulation. AI is currently up against the so-called &#8220;<a href="https://arxiv.org/abs/2403.14123">memory wall</a>&#8221; due to the fact that these models need to be so large, and the most recent innovations and <a href="https://groq.com/newsroom/groq-and-nvidia-enter-non-exclusive-inference-technology-licensing-agreement-to-accelerate-ai-inference-at-global-scale">movements</a> in ML accelerators have been to do with addressing it. Utilizing appropriate models with differently-architected communications may completely sidestep this memory wall, as well as drastically improve the efficiency<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> of equivalent computations.</p></li><li><p><strong>Generalization:</strong> It should be clear that some of the models that need to be learned for robot motion have very applicable and general models that have been known for centuries, and the same holds for biologists, cognitive scientists, and psychologists in their fields. Ilya Sutskever, one of the architects of the current LLM era, <a href="https://www.dwarkesh.com/p/ilya-sutskever-2">says</a> that their structure is weak at generalization and that generalizing in the way that humans can needs new architectures. The aforementioned DeepMind paper also cites domain adaptation and generalization to unseen tasks as something that could be improved by using world models.</p></li><li><p><strong>Safety:</strong> We&#8217;ve discussed hallucinations in this post already, and the aforementioned Quattrociocchi paper makes an argument about the reliability of results from LLMs. The point of concern is how the system will react to unseen circumstances and whether it can extrapolate in reasonable ways. It may be especially important to have mechanisms for guaranteeing the possible range of actions the robot can take and explaining its decisions.</p></li></ol><p>I didn&#8217;t feel like there is sufficient information to score the approaches yet, but it is clear that model-based approaches may offer advantages in generalization and interpretability, while model-free scaling currently dominates in deployment speed and tooling maturity.</p><h2>Closing thoughts</h2><p>Before closing out this article, I must point out that this &#8220;divide&#8221; is really a spectrum&#8212;there is likely a rich space of hybrids of the two approaches, which may consist of hierarchical structures combining the strengths of each. Deep learning excels at parsing and summarization of text and images, automatically finding the most appropriate dimensional reduction techniques. World models, when coupled with methods that know how to use them, are strong at generalization, abstraction, and can produce very computationally-efficient algorithms.</p><p>In future posts, I plan to write about any new developments on the informational or physical sides that are demonstrating usage and adoption of world models, or of new hybrid architectures. I will also be plan to write some posts where I construct simple scenarios to fairly evaluate competing architectures along the different metrics above. Last but not least, I will plan to go into more details on computational hardware acceleration of non-trivial architectures.</p><p>I believe this is going to be an ongoing recurring topic in this publication, so make sure to subscribe and share if you found this interesting.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/the-ai-world-models-debate-and-its?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/the-ai-world-models-debate-and-its?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>The underlying transformer has seen performance-related tweaks such as GQA, and more recent &#8220;mixture of experts&#8221; models create a bit of a tree-like structure by combining different models. Also, it <a href="https://x.com/fchollet/status/1802785277758591054">can be argued</a> that tool and code interpreter usage by LLMs constitute a neurosymbolic architecture. However, it is fair to say that all these tweaks don&#8217;t represent the headlining scaling strategy for leading AI companies.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>In this context, I believe the post-training component is likely to be reinforcement learning (RL) in simulation&#8212;a <a href="https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback">similar approach was used to train post-train early LLMs</a>, and is now enhanced with a multistep process, though pure RL is still used in some applications.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>In fact, &#8220;differentiable simulators&#8221; are increasingly used in RL to allow gradient-reliant training algorithms to work more easily. This is an interesting topic that we will explore more deeply in a future post, so stay subscribed for that.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>The energetic cost of DRAM access is <a href="https://mlsysbook.ai/book/contents/core/hw_acceleration/hw_acceleration.html">orders of magnitude higher</a> than a multiply-accumulate operation. Systolic architectures require fewer accesses to multiply a whole matrix than conventional scalar architectures, but with the architecture being equal, fewer weights and smaller models would undeniably reduce computational energetic cost.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Model-predictive control of RoboBee flapping flight]]></title><description><![CDATA[Hierarchical model-predictive and data-driven control method published in IJRR (2022)]]></description><link>https://www.avikde.me/p/model-predictive-control-of-robobee</link><guid isPermaLink="false">https://www.avikde.me/p/model-predictive-control-of-robobee</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Wed, 24 Dec 2025 16:11:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/RV9CJE_unHk" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this post, we&#8217;ll go over a method to control the flight of a <a href="https://wyss.harvard.edu/technology/robobees-autonomous-flying-microrobots/">RoboBee</a> in a way that should be approachable for a broad audience. In keeping with this publication&#8217;s focus on energy-efficient robotics, this method was designed to run on extremely low-power computational hardware, as we will see.</p><p>Just to provide brief context,</p><ul><li><p>the RoboBee hardware was at this point fairly mature, and on the &#8220;<a href="https://www.researchgate.net/publication/261354075_Design_Fabrication_and_Modeling_of_the_Split_Actuator_Microrobotic_Bee">Split Dual-Actuator Bee</a>&#8221; generation;</p></li><li><p>the state-of-the-art flight controller was a capable, but task-specific <a href="https://seas.harvard.edu/news/2013/05/robotic-insects-make-first-controlled-flight">hovering controller</a> with limited generalizability.</p></li></ul><p>The goal for this project was to develop a controller that could be easily generalized to more complex tasks using modern control methods. The resulting paper<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> with <a href="https://www1.villanova.edu/university/engineering/faculty-research/sports-and-performance/Faculty-Researchers/biodetail.html?mail=rebecca.mcgill@villanova.edu&amp;xsl=bio_long">Dr Rebecca McGill</a> made some demonstrable advances in terms of better operation away from an upright configuration, the ability to stabilize tasks like following a desired path or executing more dynamic behaviors like perching and flipping, as well as robustness to suboptimal gain tuning and manufacturing variability.</p><p>Here are some hovering clips (short 32s video; no audio):</p><div id="youtube2-RV9CJE_unHk" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;RV9CJE_unHk&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/RV9CJE_unHk?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>The remainder of this post explains how this result was achieved, and potential future extensions of the idea.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Background</h2><h3>RoboBee and flapping flight</h3><p>The RoboBee is a 100mg flapping robot <a href="https://www.harvardmagazine.com/science-technology/harvard-robot-bees-future-robotic-engineering">developed by Dr. Rob Wood</a>, capable of hovering and controlled flight. To put it in perspective, a US nickel weights 5g or the equivalent of 50x RoboBees. Having spent a lot of time in the Harvard microrobotics lab fabricating them, it is no exaggeration to say that a sneeze can literally destroy weeks of work.</p><p>Along with a family of similarly-fabricated robotic systems developed at the Harvard microrobotics lab, they are actuated by piezoelectric bending actuators. The piezoelectric effect is commonly seen in the working of microphones, which convert vibrations created by acoustic pressure waves into electric signals. They also do that in reverse, converting electric pulses into vibratory motion. The RoboBee uses piezoelectric bending actuators, constructed similarly to a bimetallic strip, converting slight expansion and contraction of the piezoelectric material into a bending motion.</p><p>Generally, the piezoelectric actuators produce very small motions that need to be amplified to produce the requisite aerodynamic work. After the conversion to the bending motion, they also go through another transmission that converts the small bending translational motion into a rotational motion. In a previous post, I went into the details of how this transmission works, and a project I worked on to optimize it.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;8765e2d3-bccc-43a6-8afe-49948f2ba8a5&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Using models to design a RoboBee&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Writing about safe, efficient AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-12-22T00:00:00.000Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!_Aa4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/template-based-design-robobee&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:182198523,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>OK, now we are at the stage of converting electrical signals into rotational motion of the wing. The wing itself is attached to the end of the transmission via a passive hinge, so that when the base of the wing is flapped, it not only flaps, but also pivots about its hinge, thereby actively changing its pitch, or angle-of-attack. This motion is common among flapping animals:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N2Xl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N2Xl!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 424w, https://substackcdn.com/image/fetch/$s_!N2Xl!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 848w, https://substackcdn.com/image/fetch/$s_!N2Xl!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 1272w, https://substackcdn.com/image/fetch/$s_!N2Xl!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N2Xl!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif" width="480" height="270" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:270,&quot;width&quot;:480,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N2Xl!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 424w, https://substackcdn.com/image/fetch/$s_!N2Xl!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 848w, https://substackcdn.com/image/fetch/$s_!N2Xl!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 1272w, https://substackcdn.com/image/fetch/$s_!N2Xl!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Hummingbird hovering showing changing wing pitch over flapping cycle. <a href="https://www.youtube.com/watch?v=RtUQ_pz5wlo">Source: NatGeoWild</a></figcaption></figure></div><p>RoboBee&#8217;s clever design allows the wing pitch to change passively as the wing flaps, i.e. only one actuator is needed per wing to obtain something resembling the complex wing motion of the hummingbird above:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bM5H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bM5H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 424w, https://substackcdn.com/image/fetch/$s_!bM5H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 848w, https://substackcdn.com/image/fetch/$s_!bM5H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 1272w, https://substackcdn.com/image/fetch/$s_!bM5H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bM5H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png" width="484" height="365.06484641638224" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:663,&quot;width&quot;:879,&quot;resizeWidth&quot;:484,&quot;bytes&quot;:79213,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/182263726?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bM5H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 424w, https://substackcdn.com/image/fetch/$s_!bM5H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 848w, https://substackcdn.com/image/fetch/$s_!bM5H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 1272w, https://substackcdn.com/image/fetch/$s_!bM5H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A schematic showing the construction of a &#8220;half-RoboBee,&#8221; where the piezoelectric bending actuator, transmission, and both wing joints can be seen. Figure from <a href="https://scholar.google.com/citations?view_op=view_citation&amp;hl=en&amp;user=m-A4ZdEAAAAJ&amp;sortby=pubdate&amp;citation_for_view=m-A4ZdEAAAAJ:ODE9OILHJdcC">this paper</a>.</figcaption></figure></div><h3>Modeling RoboBee&#8217;s flight</h3><p>A model of the motion produced is very important to understand how to use the available wing input signals to get to a desired goal. There is a large debate between model-based vs. model-free methods (which eschew models in gathering a lot of data with the black box system and approximating its behavior). Increasing computational power recently has resulted in increased temptation to abandon models, though in many sim2real reinforcement learning approaches, models are used in developing the simulation.</p><p>In the case of RoboBee, the difficulty with pursuing a fully model-based method is that aerodynamics is quite difficult to model. Nonetheless, some work in the early 2010&#8217;s on <a href="https://en.wikipedia.org/wiki/Blade_element_theory">blade-element modeling</a> has proved quite useful for understanding the relation of RoboBee wing motion to the produced lift and drag forces. Using that model, we developed a RoboBee simulator, which is open-sourced<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. We will discuss the software supporting this work further below, but here is an animation of some fixed control inputs (similar to the 2013 flight control work) producing simulated flapping flight, complete with passive wing pitching (short 17s video; no audio):</p><div id="youtube2-Qm0_yIEXycU" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Qm0_yIEXycU&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Qm0_yIEXycU?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>The disadvantage of the model above that it is very complex and not possible to use to directly develop a controller. However, the other components of RoboBee dynamics (excluding how the wing produces lift and drag) are well-explained by Newtonian physics. In this latter area, there is a great degree of similarity to the control of legged robots.</p><p>Typically the world of flapping flight and legged control do not overlap, but there are a number of similarities that motivate the use of similar methods. They are both</p><ul><li><p><strong>cyclic</strong> (though in the RoboBee case, the wings are assumed massless and flap so fast that their dynamics are considered decoupled from the body);</p></li><li><p><strong>mechanics-dominated</strong> (it is very important to consider the physics of ground interactions and aerodynamics); and</p></li><li><p><strong>underactuated</strong> (we don&#8217;t have enough actuators to fully stabilize the motion, and typically in these scenarios some amount of &#8220;lookahead planning&#8221; is required).</p></li></ul><p>In the legged robotics field, there is a long tradition of using simplified models to aid in control development (so-called &#8220;spring-mass&#8221; models), as I have <a href="https://www.avikde.me/p/jerboa-hopping-video">discussed</a> <a href="https://www.avikde.me/p/vertical-hopper-compositions">before</a>. In this paper, we introduce for the first time an equivalent for RoboBee-like flapping flight.</p><h3>Model-predictive control (MPC)</h3><p>As discussed above, in underactuated scenarios, it is typically the case that some knowledge about the future behavior of the system can be predicted in order to decide which inputs to supply. As a simple example, how should the cart be moved in order to get the attached pole to swing up?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6tS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6tS3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 424w, https://substackcdn.com/image/fetch/$s_!6tS3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 848w, https://substackcdn.com/image/fetch/$s_!6tS3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 1272w, https://substackcdn.com/image/fetch/$s_!6tS3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6tS3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif" width="562" height="421.20421052631576" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:712,&quot;width&quot;:950,&quot;resizeWidth&quot;:562,&quot;bytes&quot;:435249,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/182263726?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6tS3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 424w, https://substackcdn.com/image/fetch/$s_!6tS3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 848w, https://substackcdn.com/image/fetch/$s_!6tS3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 1272w, https://substackcdn.com/image/fetch/$s_!6tS3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Animation borrowed from <a href="https://commons.wikimedia.org/wiki/File:Cart-pole_swing_up.gif">here</a> per the <a href="https://en.wikipedia.org/wiki/en:Creative_Commons">Creative Commons</a> <a href="https://creativecommons.org/licenses/by-sa/4.0/deed.en">Attribution-Share Alike 4.0 International</a> license.</figcaption></figure></div><p>A global understanding of the future behavior of the system can be summarized in a so-called <a href="https://en.wikipedia.org/wiki/Value_function">value function</a>, and knowing this function can tell us exactly which way we should move to get to our goal from all states.</p><p>The problem is, the value function is not &#8220;known.&#8221; It can be estimated by exhaustively poking and prodding the system (which is an approach that resembles reinforcement learning). However, when we know of a dynamical model for the system, it is sensible to use it, because it greatly reduces the dimensionality of the control system to treat the dynamics as fixed.</p><p>Model-predictive control (MPC) tries to create a small local approximation of the value function <em>online</em> by using the future state of the system over a short prediction horizon (subject to a model) as a proxy for the value of the current state. MPC is now an old technique, but widely used in industrial process automation, aerospace, etc.</p><h2>Approach: model-based MPC and model-free inverse dynamics</h2><p>Here is the overall plan:</p><ol><li><p>Develop a simplified model capturing the desired behavior: for this step, I noted that we do not care about the heading, but instead simply that the robot stays upright.</p></li><li><p>&#8220;Anchor&#8221; the behavior on to the RoboBee: convert to signals that get sent to the acuators.</p></li></ol><p>The system architecture figure below makes this explicit. The purple &#8220;flying brick&#8221; is the model, whose future states we can predict for known inputs. The MPC can then effectively back out the best inputs <em>for that model</em> to get to a desired state. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J6jd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J6jd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 424w, https://substackcdn.com/image/fetch/$s_!J6jd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 848w, https://substackcdn.com/image/fetch/$s_!J6jd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!J6jd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J6jd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg" width="1456" height="1012" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1012,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 1&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 1" title="Figure 1" srcset="https://substackcdn.com/image/fetch/$s_!J6jd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 424w, https://substackcdn.com/image/fetch/$s_!J6jd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 848w, https://substackcdn.com/image/fetch/$s_!J6jd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!J6jd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">System architecture</figcaption></figure></div><p>However, as the blue and green arrows in the figure show, that only partially solves our problem because the RoboBee is not a flying brick. To address the gap, we need to define the operations undertaken by the arrows:</p><ul><li><p><strong>State projection (blue arrow): </strong>This process is relatively simple for this instance, because the state of the flying brick is effectively a subset of the state of the actual RoboBee. It has an elevation and body tilt angles just like the RoboBee, and we simply project the coordinates to those of the flying brick.</p></li><li><p><strong>Inverse dynamics (green arrow): </strong>The other direction is more complex&#8212;in essence, we want to go from the abstracted thrust/roll/pitch torque inputs for the brick &#8594; RoboBee wing actuator signals. This process is complex because of a couple of reasons:</p><ul><li><p>The mapping is much more complex than any kind of projection; for the various components:</p><ul><li><p>Wing voltage &#8594; actuator motion (depends on piezoelectric actuator electrical and mechanical properties)</p></li><li><p>Actuator motion &#8594; wing base motion (depends on transmission and its stiffness)</p></li><li><p>Wing base motion &#8594; wing motion (depends on hinge and wing mechanical properties)</p></li><li><p>Wing motion &#8594; reaction forces and torques (depends on wing aerodynamic interactions, ground effect, etc.)</p></li></ul></li><li><p>Manufacturing variability makes this mapping inexact (if you manufactured two RoboBees, they may require different wing signals to produce the same wing motion)</p></li></ul></li></ul><p>For these reasons, models have limited utility for the green arrows, and so, the paper proposed a model-free method for that part.</p><h2>Model-based MPC</h2><h3>Template: upright rigid body</h3><p>First, we need to pick the model. As the saying goes, all models are wrong, but the goal here is to capture the most important parts of the dynamics, and the objective.</p><p>The RoboBee&#8217;s wings are very light, and so most of its mass is truly contained in its body (more on this below). Dynamically, this is well-approximated by the flying brick, with no other moving parts.</p><p>To capture the objective, we note that we do not particularly care about the heading of the RoboBee when we just want it to hover, or fly controllably. This allows us to effectively remove one degree of freedom from our specification of the objective, and capture the state of the flying brick with:</p><ul><li><p>To capture the position, we use the <em>(x, y, z)</em> Cartesian coordinates of the center of mass as expected.</p></li><li><p>To capture the orientation, we only look at the components of the &#8220;upright vector&#8221; (a vector pointing up in the body frame). Note that an objective of hovering can be simply stated as the desire to have the upright vector point vertically up.</p></li></ul><h3>Waypoint tracking MPC</h3><p>We write the dynamical equations for the flying brick using the Newton-Euler equations for the motion of a rigid body. After a small approximation as described in the paper, we get</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\ddot p = s T - g e_3, ~~\n\\ddot s = -\\hat s B \\tau,&quot;,&quot;id&quot;:&quot;FBUJCMUYRK&quot;}" data-component-name="LatexBlockToDOM"></div><p>where <em>p</em> represents the Cartesian position, <em>s</em> represents the upright vector, <em>T</em> represents the (scalar) upward thrust, and &#964; represents the (2-dimensional) roll, pitch torque vector.</p><p>These equations are quite simple, owing mainly to the fact that the wings are quite light, and so their flapping does not significantly impact the motion of the much more massive body. This concept is also utilized in many legged running robots, referred to there as &#8220;<a href="https://underactuated.mit.edu/humanoids.html">massless legs</a>.&#8221; It&#8217;s worth taking a minute to appreciate the significance of this: in practice, human limbs are not massless, which allows (for example) a gymnast to adjust their body orientation while flying through the air by controllably moving their limbs and landing a flip. However, mastering that kind of control is much more difficult than the massless legs (or wings) paradigm, where we can safely make the assumptions that the appendages simply produce a force or torque that acts on the body. A helpful picture to have in mind is that in the massless appendage paradigm, we can substitute the appendages for thrusters attached at appendage base, and pretend we are controlling the thrust vector instead.</p><p>Upon further inspection, the equations are second-order (as expected for any mechanical system). The orientation equation is also unfortunately nonlinear, as can be seen from the product of <em>s, T</em>,  and &#964; appearing on the right side. This is also normal for such systems, but adds a challenge to our MPC transcription.</p><p>To resolve this difficulty, we <em>linearize</em> these dynamics at the current orientation and thrust <em>(s<sub>0</sub>, T<sub>0</sub>) </em>before incorporating them into the model-predictive controller. The controller will reason about the best inputs based on how they act on the current state, which intuitively is fine for a short enough planning horizon.</p><p>As an analogy, a car driver on the highway will turn their steering wheel slightly to change lanes (an action that is appropriate for a planning horizon for a few seconds), even though that action would not be appropriate on a long enough horizon that they drive off the highway. Similar to the car driver, the RoboBee in this scenario will re-evaluate its inputs with a new state soon enough. MPC always works this way, with a finite planning horizon a short duration from the current time.</p><p>The objective for the MPC is to track a trajectory of future states, including a position and velocity. For example, to hover, the desired position is the hovering goal position, and the desired velocity is zero. To follow a particular path in space, that path can be discretized and substituted into the desired positions.</p><h3>Simulation evaluation</h3><p>To evaluate if the MPC with the linearized dynamics above works appropriately, we can compare the performance of the controller in a number of simple tasks in an apples-to-apples comparison with the prior state-of-the-art reactive controller.</p><h4>Hovering, trajectory following</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XFiy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XFiy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XFiy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XFiy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XFiy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XFiy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg" width="1456" height="655" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:655,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 4&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 4" title="Figure 4" srcset="https://substackcdn.com/image/fetch/$s_!XFiy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XFiy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XFiy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XFiy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Simulation evaluation of MPC vs. reactive on the upright model.</figcaption></figure></div><p>The tasks, as shown above, were:</p><ul><li><p>Hover task starting off withan initial orientation with roll&#8203; and pitch angles set to 0.5 rad, -0.5 rad, and initial velocity 0.1 m/s in the <em>x</em>-direction&#8203;</p></li><li><p>Waypoint tracking on an&#8203; &#8220;S&#8221;-shaped trajectory in the <em>xz</em>-plane.&#8203;</p></li><li><p>Tracking a commanded velocity of 2m/s for 0.5 seconds before stopping.</p></li></ul><p>In each of these scenarios, the MPC performs better than the reactive controller (notes on tuning below), which is promising.</p><h4>Perching, flipping</h4><p>Specifying a task in terms of a reference trajectory can be onerous, for example, if we want the bee to do a backflip, it isn&#8217;t clear what sequences of positions and velocities are appropriate for the horizon.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> To test the robustness of the MPC, here we feed it &#8220;made up&#8221; infeasible trajectories and see how well it can track them.</p><p>The tasks we choose to test include the aforementioned flip, and a wall-perching behavior inspired by this past research:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YwQT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YwQT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 424w, https://substackcdn.com/image/fetch/$s_!YwQT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 848w, https://substackcdn.com/image/fetch/$s_!YwQT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 1272w, https://substackcdn.com/image/fetch/$s_!YwQT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YwQT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png" width="551" height="177" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:177,&quot;width&quot;:551,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YwQT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 424w, https://substackcdn.com/image/fetch/$s_!YwQT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 848w, https://substackcdn.com/image/fetch/$s_!YwQT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 1272w, https://substackcdn.com/image/fetch/$s_!YwQT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">A perching task from <a href="https://www.science.org/doi/abs/10.1126/science.aaf1092">this paper</a> from 2016.</figcaption></figure></div><p>The reference trajectories are selected intentionally naively:</p><ul><li><p>For the perch task, the desired position translates smoothly to the right, and the desired orientation steadily rotates to 90 degrees at the end of the motion</p></li><li><p>For the flip task, the desired position is fixed, and the desired orientation smoothly rotates 360 degrees.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xi_9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xi_9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Xi_9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Xi_9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Xi_9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xi_9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg" width="1456" height="422" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:422,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 5&quot;,&quot;title&quot;:&quot;Figure 5&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 5" title="Figure 5" srcset="https://substackcdn.com/image/fetch/$s_!Xi_9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Xi_9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Xi_9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Xi_9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Simulation evaluation of perch and flip behaviors.</figcaption></figure></div><p>The results show that the MPC is able to compensate for the naivet&#233; of the reference trajectories to accomplish the task to satisfaction. The reactive hover controller cannot solve these tasks.</p><h4>A note on tuning the controllers</h4><p>Something that most research papers will sweep under the rug is the process of how the controllers were tuned. The previous state-of-the-art reactive controller has hand-tuned PD gains, and the MPC has weights on the objective. To make a fair comparison, we have to tune both as best as possible.</p><p>In general, there is a tradeoff between tracking error and tracking effort. As an analogy, cruise control in cars often have an eco mode, where they may deviate from the speed setpoint a bit more, but waste less fuel. Similarly, you can spend less actuator effort in exchange for tracking the goal a little less precisely. This is usually one of the ways in which controllers are tuned in practive.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L-j6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L-j6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 424w, https://substackcdn.com/image/fetch/$s_!L-j6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 848w, https://substackcdn.com/image/fetch/$s_!L-j6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!L-j6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L-j6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg" width="1456" height="753" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:753,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 7&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 7" title="Figure 7" srcset="https://substackcdn.com/image/fetch/$s_!L-j6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 424w, https://substackcdn.com/image/fetch/$s_!L-j6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 848w, https://substackcdn.com/image/fetch/$s_!L-j6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!L-j6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><strong>Left: </strong>The MPC can attain low tracking error with a broad swath of weight magnitudes; <strong>right:</strong> comparing the MPC and reactive controller tuning.&#8203;</figcaption></figure></div><p>The plot on the right shows the MPC and the reactive controllers fairly compared with a variety of tuning gains, showing that the MPC is significantly easier to tune, and can track better with lower actuator effort than is possible with the reactive controller.</p><h2>Data-driven inverse dynamics</h2><p>As we discussed above, the mapping from actuator signal &#8594; produced force/torque is unknown/uncertain due to the system complexity and manufacturing variability.</p><p>An example of a common type of manufacturing variability is that some RoboBee transmissions just exhibit higher stiffness than others. If the left wing has a stiffer transmission than the right wing, the left wing may flap with a smaller wing amplitude than the right one when driven equivalently, and produce much less lift force.</p><p>In this project we took the approach of breaking down the components of this mapping, and just using data to characterize the variable parts. This meant collecting data of wing kinematics as a function of actuator signals and then fitting a function to approximate some &#8220;kinematics features&#8221; that could be expected for each actuator signal:&#8203;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XgSF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XgSF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XgSF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XgSF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XgSF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XgSF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg" width="1456" height="1035" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1035,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 3&quot;,&quot;title&quot;:&quot;Figure 3&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 3" title="Figure 3" srcset="https://substackcdn.com/image/fetch/$s_!XgSF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XgSF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XgSF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XgSF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Kinematics features measured had to do with the win flap up and down stroke amplitudes, and the attained wing pitch.</figcaption></figure></div><p>We then used the blade-element model to predict the reaction force/torque from the wing kinematics.</p><p>To show the effect of this kind of mapping, we performed the same operation in the RoboBee simulator, and simulated the effect of adding a force bias of 3 mN to one of the actuators. With no force bias, the data-driven mapping and the manually-tuned mapping both work, but with the force bias, the data-driven mapping can still work while the manually-tuned mapping fails.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dA35!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dA35!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 424w, https://substackcdn.com/image/fetch/$s_!dA35!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 848w, https://substackcdn.com/image/fetch/$s_!dA35!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 1272w, https://substackcdn.com/image/fetch/$s_!dA35!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dA35!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805" width="896" height="805" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/05c9395d-501c-477c-ab26-7157c2ad65c2_896x805&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:805,&quot;width&quot;:896,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dA35!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 424w, https://substackcdn.com/image/fetch/$s_!dA35!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 848w, https://substackcdn.com/image/fetch/$s_!dA35!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 1272w, https://substackcdn.com/image/fetch/$s_!dA35!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Comparison of data-driven (WLQP) inverse dynamics to manually tuned mapping.</figcaption></figure></div><h2>Hardware integration</h2><h3>Setup</h3><p>Encouraged by the simulation results, we pushed ahead to integrate the MPC into the physical RoboBee control system, which looks as follows:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lQGk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lQGk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 424w, https://substackcdn.com/image/fetch/$s_!lQGk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 848w, https://substackcdn.com/image/fetch/$s_!lQGk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!lQGk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lQGk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg" width="1456" height="611" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:611,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 10&quot;,&quot;title&quot;:&quot;Figure 10&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 10" title="Figure 10" srcset="https://substackcdn.com/image/fetch/$s_!lQGk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 424w, https://substackcdn.com/image/fetch/$s_!lQGk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 848w, https://substackcdn.com/image/fetch/$s_!lQGk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!lQGk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">System architecture for RoboBee flight experiments and the actual experimental setup. The tether becomes slack during flight.</figcaption></figure></div><p>The actuators were connected to a <a href="https://www.mathworks.com/products/simulink-real-time.html">Simulink real-time</a> control PC, which was new to me. The setup encourages code to mostly be compiled from graphical blocks such as filters, delays, etc., but does allow for custom blocks written as MATLAB functions. While the state estimator and some other components were in fact MATLAB functions, we implemented the MPC in C using <a href="https://osqp.org/">OSQP</a>, as part of a more forward-looking architecture that could also run onboard the RoboBee on a microcontroller.</p><p>When run from the Simulink target PC, the iteration frequency was 5KHz for everything, locked together due to the Simulink architecture. The MPC itself also ran at 100-200Hz on small STM32G4 MCU&#8203; that fell within the 25mg payload constraints of the RoboBee. We tested that the controller could successfully stabilize the simulator when run at rates of 100Hz.</p><h3>Experimental results for hovering</h3><p>A video clip of some of the hovering results were linked to in the introduction of this post. Some overlaid trajectories from those trials are shown in the figure below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FX6O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FX6O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 424w, https://substackcdn.com/image/fetch/$s_!FX6O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 848w, https://substackcdn.com/image/fetch/$s_!FX6O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 1272w, https://substackcdn.com/image/fetch/$s_!FX6O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FX6O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png" width="480" height="613.3333333333334" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1426,&quot;width&quot;:1116,&quot;resizeWidth&quot;:480,&quot;bytes&quot;:564795,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/182263726?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FX6O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 424w, https://substackcdn.com/image/fetch/$s_!FX6O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 848w, https://substackcdn.com/image/fetch/$s_!FX6O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 1272w, https://substackcdn.com/image/fetch/$s_!FX6O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Each trial ended due to the motion capture system losing track of the RoboBee, or by a command we sent. We were able to keep the orientation stabilized in each trial, though the horizontal position drifted more than desired.</p><p>The hovering task was overall a good demonstration of the feasibility of integrating this much more advanced controller paradigm into the RoboBee.</p><p>In the future, it would be very exciting to see either or both:</p><ul><li><p>some of the tasks we tested (and compared to the reactive controller) in the simulation section running on the RoboBee</p></li><li><p>the controller running on a microcontroller, along with onboard sensing and power, for fully untethered complex flight</p></li></ul><h2>Implementation details and replicating results</h2><p>In the interests of open science, the code for various parts of this project are all <a href="https://github.com/avikde/robobee3d">online</a>. While I don&#8217;t have continued access to the Simulink software and experimental setup, if you need support, please comment below&#8212;continued progress and replicability are well worth the support and debugging.</p><h3>MPC</h3><p>This is implemented as a quadratic program with OSQP.</p><ul><li><p>The quadratic program is defined in <a href="https://github.com/avikde/robobee3d/blob/master/template/genqp.py">genqp.py</a>. When that file is run as a script, it instantiates the controller and runs a test, or in the commented-out section at the bottom, run&#8217;s <a href="https://osqp.org/docs/codegen/index.html">OSQP&#8217;s codegen</a> feature to generate a standalone set of C files that can solve the QP. The codegen output is stored in the <a href="https://github.com/avikde/robobee3d/tree/master/template/uprightmpc2">uprightmpc2</a> directory (though it can be regenerated as well).</p></li><li><p>The codegen outputs define the structure of the problem, but the variables need to be <a href="https://osqp.org/docs/examples/update-matrices.html">updated</a> as the current state of the RoboBee or the reference trajectory changes. To do this, the <a href="https://github.com/avikde/robobee3d/blob/master/template/uprightmpc2/uprightmpc2.h">uprightmpc2.h</a> file provides some simple interfaces with named parameters that can be called. The C file of the same name contains its implementation.</p></li><li><p>The C code in the uprightmpc2 file can be built using CMake; something like</p></li></ul><pre><code>cd uprightmpc2
mkdir -p build &amp;&amp; cd build
cmake ..</code></pre><h3>Simulations</h3><ul><li><p>The simulations testing the MPC with the upright template model can be run from the <a href="https://github.com/avikde/robobee3d/tree/master/template">template</a> directory.</p></li></ul><ul><li><p>The <a href="https://github.com/avikde/robobee3d/blob/master/template/uprightmpc2.py">uprightmpc2.py</a> file should recreate the test scenarios covered in plots above and in the paper when run as a script. The bottom of the file contains code describing the test scenarios that can be uncommented.</p></li><li><p>The 3D pybullet simulation can be run by executing the <a href="https://github.com/avikde/robobee3d/blob/master/template/robobee.py">robobee.py</a> script.</p></li></ul><h3>Simulink setup</h3><ul><li><p>The C code is integrated into the Simulink real-time setup as an <a href="https://www.mathworks.com/help/simulink/sfg/what-is-an-s-function.html">S-function</a>; the legacy_code_gen.m file configures the inputs and outputs of the block that will appear in Simulink. See <a href="https://www.mathworks.com/help/simulink/sfg/integrating-existing-c-functions-into-simulink-models-with-the-legacy-code-tool.html">this page</a> for more guidance on this process, which was quite tricky.</p></li><li><p>The simulink model files are slx files, and can be found <a href="https://github.com/avikde/robobee3d/tree/master/template/matlab">here</a>.</p></li></ul><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><a href="https://journals.sagepub.com/doi/pdf/10.1177/02783649211063225">An efficient, modular controller for flapping flight composing model-based and model-free components - Avik De, Rebecca McGill, Robert J Wood, 2022</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><a href="https://github.com/avikde/robobee3d">avikde/robobee3d: Robobee research including controls, modeling, and simulation</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>In practice, for these kind of tasks, it is common in the state-of-the-art to use offline optimization or learning (which takes much more computation to run) to figure out the best trajectory, and then use that reference for the MPC.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Power-efficient and safe mobile robots]]></title><description><![CDATA[Talk at OSU CoRIS seminar]]></description><link>https://www.avikde.me/p/power-efficient-safe-robots</link><guid isPermaLink="false">https://www.avikde.me/p/power-efficient-safe-robots</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Mon, 23 Dec 2024 00:00:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bxUH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I gave a <a href="https://engineering.oregonstate.edu/events/power-efficient-autonomous-mobile-robots">talk at OSU&#8217;s CoRIS seminar</a>. It was a joy to visit OSU&#8217;s Robotics department. The faculty are driven to solve problems grounded in the real world, in application areas ranging from under the sea to the peak of Mt. Hood. Also, it was only partially raining on the day of the seminar (which I found out was a rarity).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bxUH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bxUH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bxUH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bxUH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bxUH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bxUH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg" width="600" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;OSU&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="OSU" title="OSU" srcset="https://substackcdn.com/image/fetch/$s_!bxUH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bxUH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bxUH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bxUH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The <a href="https://en.wikipedia.org/wiki/Pacific_Northwest">PNW</a> scenery is terrific and would be a great draw if it didn&#8217;t mostly rain from September to May.</figcaption></figure></div><h2>Modularity</h2><p>In this talk, I started a bottom-up exploration of composition in robotics.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power} by avikde! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>Dynamic legged locomotion</h3><p>As with I&#8217;m sure many others, as a young graduate student, I was inspired by the dynamic legged locomotion work at the MIT Leg Lab in the 1980&#8217;s:</p><div id="youtube2-Bd5iEke6UlE" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Bd5iEke6UlE&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Bd5iEke6UlE?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>In his thought-provoking <a href="https://mitpress.mit.edu/9780262681193/legged-robots-that-balance/">book</a>, <a href="https://en.wikipedia.org/wiki/Marc_Raibert">Raibert</a> articulated an intriguing idea called &#8220;Control of Running Decomposed into Three Parts.&#8221; Researchers have been trying to understand when and how this may be possible, and how it generalizes, since then.</p><p>My Ph.D. advisor, <a href="https://directory.seas.upenn.edu/daniel-e-koditschek/">Koditschek</a>, has been doing that for decades. In the 1990&#8217;s, his research group built and impressive array of juggling robots (as a less-power-hungry proxy for cyclic dynamical behavior):</p><div id="youtube2-u8I7EXXgTvk" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;u8I7EXXgTvk&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/u8I7EXXgTvk?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>In the course of the juggling research, they introduced a formal idea of <a href="https://deepblue.lib.umich.edu/bitstream/handle/2027.42/67990/10.1177_02783649922066385.pdf">sequential composition</a> with an intuitive but mathematically rigorous and useful idea:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z6_V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z6_V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 424w, https://substackcdn.com/image/fetch/$s_!Z6_V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 848w, https://substackcdn.com/image/fetch/$s_!Z6_V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 1272w, https://substackcdn.com/image/fetch/$s_!Z6_V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z6_V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png" width="400" height="407" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:407,&quot;width&quot;:400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Sequential Composition in IJRR '99&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sequential Composition in IJRR '99" title="Sequential Composition in IJRR '99" srcset="https://substackcdn.com/image/fetch/$s_!Z6_V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 424w, https://substackcdn.com/image/fetch/$s_!Z6_V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 848w, https://substackcdn.com/image/fetch/$s_!Z6_V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 1272w, https://substackcdn.com/image/fetch/$s_!Z6_V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The &#8220;funnels&#8221; picture of sequential composition</figcaption></figure></div><p>Analogously, we can retroactively label Raibert&#8217;s &#8220;control in three parts&#8221; idea as an example of <a href="/jerboa-hopping-video">parallel composition</a>. While the term is not extremely common in the robotics literature, similar concepts appear with names such as &#8220;decoupled control&#8221;. The idea has clearly been empirically useful, but <a href="/hybrid-averaging">formalizing it</a> has been quite tricky with any degree of generality.</p><p>Sequential and parallel composition are a very intuitive idea with equivalents in programming and spoken language. Consider the example of generating spoken language &#8211; instead of outputting the sounds corresponding to an entire sentence at once, we may want to start by assembling words from <a href="https://en.wikipedia.org/wiki/Phoneme">phonemes</a>, and assembling those into sentences. On the other hand, modern <a href="https://en.wikipedia.org/wiki/Deep_learning_speech_synthesis">deep learning speech synthesis</a> may not have any such compositional properties, which is an intentional counterpoint that we will return to.</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/ChengleiSi/status/1731047065382523332?s=20&quot;,&quot;full_text&quot;:&quot;I saw debates on whether GPT-4V can &#8220;solve&#8221; compositionality, so I spent my precious Friday afternoon benchmarking it on Winoground.\n\nTldr: NO it&#8217;s still far from solved (GPT-4V 38.0% vs PaLI 28.8% vs MTurk Humans 85.5%).\n\nColab w/ all results: <a class=\&quot;tweet-url\&quot; href=\&quot;https://tinyurl.com/winogpt4v\&quot;>tinyurl.com/winogpt4v</a> \n\n&#129525;(1/n)&quot;,&quot;username&quot;:&quot;ChengleiSi&quot;,&quot;name&quot;:&quot;CLS&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/1356609929243734018/FDzdwcv6_normal.jpg&quot;,&quot;date&quot;:&quot;2023-12-02T20:25:56.000Z&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:7,&quot;retweet_count&quot;:48,&quot;like_count&quot;:323,&quot;impression_count&quot;:115213,&quot;expanded_url&quot;:null,&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h3>Modularity elsewhere</h3><p>Deep learning did evolve from neural networks, which evoke biology right in the name. Biology has <a href="/what-are-robot-dogs">inspired many of the working principles</a> of quadrupedal robots, including behavioral modularity.</p><p>Animals have an abundance of sensory inputs and muscle, but the number of task-level variables important to any particular task is a lot smaller (<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC4121431/">Ting (2007)</a>). Going further, <a href="https://www.sciencedirect.com/science/article/pii/S0896627315001579">Ting et. al. (2015)</a> argues that motor modules arise from neural plasticity in spinal structures that selective coordinate and co-activate multiple muscles. The result is that animals can control tasks like balancing in a hierarchical fashion, keeping the dimension of the task-space control low.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8aEl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8aEl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 424w, https://substackcdn.com/image/fetch/$s_!8aEl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 848w, https://substackcdn.com/image/fetch/$s_!8aEl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 1272w, https://substackcdn.com/image/fetch/$s_!8aEl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8aEl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png" width="1456" height="638" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:638,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Modules in Biology&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Modules in Biology" title="Modules in Biology" srcset="https://substackcdn.com/image/fetch/$s_!8aEl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 424w, https://substackcdn.com/image/fetch/$s_!8aEl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 848w, https://substackcdn.com/image/fetch/$s_!8aEl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 1272w, https://substackcdn.com/image/fetch/$s_!8aEl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Modularity in biology</figcaption></figure></div><p>While robots typically have fewer actuators than an animal has muscles, each individual task will typically be overactuated for a general-purpose robot. For example, a humanoid robot will not need its arms to maintain a standing posture.</p><p>If we accept the presence of these motor modules, these patterns of activation could be re-used for different behaviors. Quoting <a href="https://www.sciencedirect.com/science/article/pii/S0896627315001579">Ting et. al. (2015)</a>:</p><blockquote><p>Multifunctionality: muscles can contribute to many actions; a few muscles can be combined in many ways to produce a wide range of different actions.</p></blockquote><p>Making equivalences to the synthetic disciplines, there is a clear connection to the idea of re-using behavioral modules, as we showed with <a href="/vertical-hopper-compositions">Minitaur vertical hopper compositions</a>.</p><p>Putting it all together, I&#8217;d argue that there are equivalences between biology and robotics in three distinct aspects of modularity:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WRpR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WRpR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 424w, https://substackcdn.com/image/fetch/$s_!WRpR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 848w, https://substackcdn.com/image/fetch/$s_!WRpR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 1272w, https://substackcdn.com/image/fetch/$s_!WRpR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WRpR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png" width="1341" height="290" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:290,&quot;width&quot;:1341,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Modularity is Everywhere&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Modularity is Everywhere" title="Modularity is Everywhere" srcset="https://substackcdn.com/image/fetch/$s_!WRpR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 424w, https://substackcdn.com/image/fetch/$s_!WRpR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 848w, https://substackcdn.com/image/fetch/$s_!WRpR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 1272w, https://substackcdn.com/image/fetch/$s_!WRpR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h3>Modularity benefits</h3><p>Some of the benefits of modularity that are enjoyed by biological systems can also apply to the synthetic disciplines as well.</p><p>Motor modules can help navigate a &#8220;difficult-to-search and nonlinear set of neuromechanical solutions for movement&#8221; (<a href="https://www.sciencedirect.com/science/article/pii/S0896627315001579">Ting et. al. (2015)</a>) as well as the &#8220;curse of dimensionality&#8221; in various engineering disciplines. This has clear implications on the computational requirements for algorithms.</p><p>A slightly less obvious use case for modularity is for optimizing robot design for <a href="/template-based-design-robobee">flapping</a>, <a href="https://www.science.org/doi/abs/10.1126/scirobotics.aag2048">jumping</a>, etc., using coordinated movement patterns (or, template trajectories).</p><h2>Real-world robotics</h2><p>As robotics tools proliferate, their side-effects will start to also have a larger and larger impact on society.</p><h3>Safety and predictability</h3><p>The autonomous vehicle industry is possibly the first (but certainly not the last) subfield that has been thrust into the limelight of the question of safety of autonomous systems. The responsible peer-reviewed efforts of the first-party companies (e.g. <a href="https://waymo.com/safety/research/">Waymo</a>) are huge steps in the right direction, but that is certainly not the end of the story.</p><p>Robustness and multiple solutions inherent to a modular structure (as we saw above) is in stark contrast to the weakness of monolithic AI structures when subject to uncertainty (<a href="https://ieeexplore.ieee.org/document/10778107">Cummings</a>).</p><p>Intuitively, a modular architecture can be &#8220;debugged&#8221; and intermediate outputs can be logged and inspected. Just like a black box recording of an aircraft allows review of inputs made from the pilot to the machine, a modular structure allows insight into, and thresholding of, the function of individual modules:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LINj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LINj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 424w, https://substackcdn.com/image/fetch/$s_!LINj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 848w, https://substackcdn.com/image/fetch/$s_!LINj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 1272w, https://substackcdn.com/image/fetch/$s_!LINj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LINj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png" width="800" height="278" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:278,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Safety and Predictability&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Safety and Predictability" title="Safety and Predictability" srcset="https://substackcdn.com/image/fetch/$s_!LINj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 424w, https://substackcdn.com/image/fetch/$s_!LINj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 848w, https://substackcdn.com/image/fetch/$s_!LINj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 1272w, https://substackcdn.com/image/fetch/$s_!LINj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Safety via modularity</figcaption></figure></div><h3>Energy</h3><p>While mechanical work done by robots necessarily needs energetic input (and the conversion efficiency can be <a href="https://www.worldscientific.com/doi/abs/10.1142/9789814415958_0057">quite high</a>), the cost of computational work is nowhere close to the only known fundamental energetic limit based on <a href="https://en.wikipedia.org/wiki/Landauer%27s_principle">Landauer&#8217;s principle</a>.</p><p>Even as chips get more and more efficient, our appetite for computation outstrips those benefits, raising <a href="https://www.nature.com/articles/d41586-024-03408-z">continual</a> <a href="https://www.technologyreview.com/2024/12/13/1108719/ais-emissions-are-about-to-skyrocket-even-further/">concern</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uomH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uomH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 424w, https://substackcdn.com/image/fetch/$s_!uomH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 848w, https://substackcdn.com/image/fetch/$s_!uomH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 1272w, https://substackcdn.com/image/fetch/$s_!uomH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uomH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png" width="1110" height="502" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:502,&quot;width&quot;:1110,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Energy&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Energy" title="Energy" srcset="https://substackcdn.com/image/fetch/$s_!uomH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 424w, https://substackcdn.com/image/fetch/$s_!uomH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 848w, https://substackcdn.com/image/fetch/$s_!uomH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 1272w, https://substackcdn.com/image/fetch/$s_!uomH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">&#8220;AI&#8217;s energy crisis&#8221;</figcaption></figure></div><p>As already recognized by biology, a growing community of researchers are exploiting the fact that <a href="https://dl.acm.org/doi/10.1145/3408062">modular neural networks reduce power consumption</a>.</p><h2>The case for compositionality</h2><p>Modularity comes with a price. The motor modules in humans have appeared over the (long) course of animal evolution, and the modular control structures developed for robots need to be hand-crafted. These processes are much less automatic, and <a href="https://en.wikipedia.org/wiki/Attention_Is_All_You_Need">need more work than</a> scaling a simple structure with more data. In fact, the importance of pushing for architectural progress may not be limited to robotics (<a href="https://thenextweb.com/news/meta-yann-lecun-ai-behind-human-intelligence">LeCun</a>).</p><p>Additionally, modularity necessarily imposes limits on the space of usable methods or algorithms. For example, a modular controller reasoning with the equivalent of &#8220;motor modules&#8221; for a triple pendulum would never be able to accomplish this:</p><div id="youtube2-lbJfh0MOcp0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;lbJfh0MOcp0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/lbJfh0MOcp0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Nevertheless, the question of system abstraction with modularity has come up before in other fields such as digital VLSI and programming languages, and has clearly won out, in part due to the reasons discussed above.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TNPx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TNPx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 424w, https://substackcdn.com/image/fetch/$s_!TNPx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 848w, https://substackcdn.com/image/fetch/$s_!TNPx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 1272w, https://substackcdn.com/image/fetch/$s_!TNPx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TNPx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png" width="800" height="388" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:388,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Abstraction&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Abstraction" title="Abstraction" srcset="https://substackcdn.com/image/fetch/$s_!TNPx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 424w, https://substackcdn.com/image/fetch/$s_!TNPx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 848w, https://substackcdn.com/image/fetch/$s_!TNPx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 1272w, https://substackcdn.com/image/fetch/$s_!TNPx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Abstraction in computer engineering</figcaption></figure></div><p>We don&#8217;t yet have a generally accepted methodology or architecture in robotics that could be a foundation for symbolic behavior programming.</p><p>End-to-end deep neural networks have become a useful and generally-accepted architecture without compositional properties, but neural networks are not necessarily incompatible with compositionality (<a href="https://direct.mit.edu/neco/article/35/3/413/114140/How-to-Represent-Part-Whole-Hierarchies-in-a">Hinton</a>, <a href="https://compositionalintelligence.github.io/pdfs/Marcus.pdf">Marcus</a>). For more on this topic, I highly recommend the proceedings of this workshop on <a href="https://compositionalintelligence.github.io/">The Challenge of Compositionality for AI</a>.</p><p>What is the path forward?</p><p>If we value the benefits of modularity discussed above, it will take more work to develop the correct architectures, but this work is essential to get to the point of robotics becoming a true scientific discipline with predictable outcomes.</p>]]></content:encoded></item></channel></rss>