<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[min{power}]]></title><description><![CDATA[Explorations in computing and robotics focused on power-efficiency and safety -- personal posts by Avik De, robotics Ph.D. and founder]]></description><link>https://www.avikde.me</link><image><url>https://substackcdn.com/image/fetch/$s_!Z7FY!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png</url><title>min{power}</title><link>https://www.avikde.me</link></image><generator>Substack</generator><lastBuildDate>Fri, 08 May 2026 10:42:56 GMT</lastBuildDate><atom:link href="https://www.avikde.me/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Avik De]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[minpower@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[minpower@substack.com]]></itunes:email><itunes:name><![CDATA[Avik De]]></itunes:name></itunes:owner><itunes:author><![CDATA[Avik De]]></itunes:author><googleplay:owner><![CDATA[minpower@substack.com]]></googleplay:owner><googleplay:email><![CDATA[minpower@substack.com]]></googleplay:email><googleplay:author><![CDATA[Avik De]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How an LLM Changes its Mind]]></title><description><![CDATA[Safety and efficiency with universal approximators and Turing machines]]></description><link>https://www.avikde.me/p/how-an-llm-changes-its-mind</link><guid isPermaLink="false">https://www.avikde.me/p/how-an-llm-changes-its-mind</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Tue, 05 May 2026 12:14:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ghz-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Deep neural networks are unlocking solutions to new classes of problems seemingly on a monthly or weekly basis. The capabilities of LLMs, coding assistants, and agents are very impressive, but it&#8217;s also easy to get a bit carried away about what they are actually doing when they are provocatively referred to as artificial intelligence. They are still algorithms, and it&#8217;s good to take a step back to look at the type of algorithm they actually are.</p><p>Fortunately, we know a lot about what deep neural networks represent. As a starting point, the <strong><a href="https://en.wikipedia.org/wiki/Universal_approximation_theorem">universal approximation theorem</a> (UAT)</strong> says that a <strong>feed-forward neural network</strong> with at least one hidden layer can <strong>approximate any continuous function over a compact domain</strong> to any desired degree of accuracy, provided it has enough neurons and a non-linear activation function.</p><p>This begs a number of follow-up questions:</p><ul><li><p>What kinds of tasks are (not) solved by approximating a continuous function?</p></li><li><p>For this purpose, are transformers equivalent to feedforward neural networks, or do they do something different?</p></li><li><p>How do these map to computational hardware, like CPUs, GPUs, or NPUs?</p></li></ul><p>Answering these questions requires a review of what &#8220;computation&#8221; means,  looking all the way back to the writings of Turing, Minsky, and Chomsky. In exchange we get some insights into the versatility as well as the energetic cost of current AI.</p><p>I&#8217;ll provide some answers to the first two questions in this post, and a detailed look at the last one in a follow-up.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h2>Universal Approximation</h2><p>The prototypical &#8220;feedforward neural network&#8221; from the UAT is a multi-layer perceptron (MLP). This is typically composed of linear layers (which multiply its inputs by a weighting matrix) and a nonlinear activation function.</p><p>In the plots below<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, we&#8217;re approximating a quasi-sinusoidal curve on the left and a square wave on the right using an MLP.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ghz-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ghz-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 424w, https://substackcdn.com/image/fetch/$s_!ghz-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 848w, https://substackcdn.com/image/fetch/$s_!ghz-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 1272w, https://substackcdn.com/image/fetch/$s_!ghz-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ghz-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png" width="516" height="276.0524781341108" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70167b78-b951-4a03-8815-19710a04b7d0_686x367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:367,&quot;width&quot;:686,&quot;resizeWidth&quot;:516,&quot;bytes&quot;:61200,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/196138155?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ghz-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 424w, https://substackcdn.com/image/fetch/$s_!ghz-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 848w, https://substackcdn.com/image/fetch/$s_!ghz-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 1272w, https://substackcdn.com/image/fetch/$s_!ghz-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70167b78-b951-4a03-8815-19710a04b7d0_686x367.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>With larger model width and depth:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RSJR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RSJR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 424w, https://substackcdn.com/image/fetch/$s_!RSJR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 848w, https://substackcdn.com/image/fetch/$s_!RSJR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 1272w, https://substackcdn.com/image/fetch/$s_!RSJR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RSJR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png" width="519" height="276.3021582733813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:370,&quot;width&quot;:695,&quot;resizeWidth&quot;:519,&quot;bytes&quot;:55233,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/196138155?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!RSJR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 424w, https://substackcdn.com/image/fetch/$s_!RSJR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 848w, https://substackcdn.com/image/fetch/$s_!RSJR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 1272w, https://substackcdn.com/image/fetch/$s_!RSJR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4edc05-ec4c-4a42-ad48-e297d7e64573_695x370.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You&#8217;ll notice that the square wave is much more difficult to approximate than the sinusoidal one. Why is that? If you recall from above, the UAT promised that the MLP would be good at approximating <strong>continuous functions</strong>, and the square wave has periodic discontinuities.</p><p>Before you think that this is some pedantic example that would never occur in practice, let me offer two more practical ones that are equivalent.</p><p>Suppose you have a drone flying through a forest of tall trees:</p><div id="youtube2-m89bNn6RFoQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;m89bNn6RFoQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/m89bNn6RFoQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>The task is obstacle avoidance: the input is the front camera view, and the output we&#8217;d like is a path that won&#8217;t collide with a tree. In such a view, if the view changes <em>continuously</em> in such a way that a path becomes too narrow to pass through, the safe path must jump to a different one <em>discontinuously</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B9iV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B9iV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 424w, https://substackcdn.com/image/fetch/$s_!B9iV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 848w, https://substackcdn.com/image/fetch/$s_!B9iV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 1272w, https://substackcdn.com/image/fetch/$s_!B9iV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B9iV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png" width="491" height="201" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:201,&quot;width&quot;:491,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18687,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/196138155?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B9iV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 424w, https://substackcdn.com/image/fetch/$s_!B9iV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 848w, https://substackcdn.com/image/fetch/$s_!B9iV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 1272w, https://substackcdn.com/image/fetch/$s_!B9iV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b947c1d-8cc0-43f5-88f6-d5c99dffd074_491x201.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>An MLP would need to sample the input space very densely to quickly interpolate between the left path and the right one (as in the square wave example above). This has a high model size penalty, and additionally needs to interpolate through an unsafe part of the output space.</p><p>Another practical example is related to the title of the article. Assuming an LLM&#8217;s output is a token view of an internal reasoning state, &#8220;changing its mind&#8221; on a yes / no question requires a similar jump in its state. However, the internal computing machinery of a modern LLM, the transformer, is more complex than an MLP. We&#8217;ll look into the two categories separately below.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><p></p><h2>Lookup Tables to Turing Machines</h2><p>The computation power guaranteed by the UAT is equivalent to a lookup table. A lookup table effectively pairs inputs and outputs so that it can &#8220;look up&#8221; the appropriate output when queried with an input. In continuous spaces, this can include some interpolation or extrapolation. The curve approximation figure above is a good visualization of this: the table would contain {x, y} entries. The compact domain condition of the UAT effectively ensures that the number of entries in the lookup table is finite.</p><p>On the other end of complexity, we have a <a href="https://en.wikipedia.org/wiki/Turing_machine">Turing machine</a>: an automaton that has access to unbounded memory, and is able to make discrete decisions based on what is in its memory. While this may sound foreign, it is actually a very familiar concept. A CPU paired with almost any programming language is a Turing machine (putting aside the implementation detail of potentially running out of memory). You can control a program&#8217;s flow using <code>if</code>, <code>while</code>, etc. and call subroutines, and with these building blocks, you can build any software that has ever been written.</p><p>It should be clear that a Turing machine can do fundamentally more than a lookup table:</p><ol><li><p>It can process an input that is arbitrarily large, which a lookup table cannot do. For example, you can <a href="https://en.wikipedia.org/wiki/Integer_factorization">very easily write</a> a CPU program that factorizes an integer, but we could never fit such an algorithm on a lookup table, since you could always input a larger integer. A more current example is a pre-transformer language model, which could not handle sequences of arbitrary length, and thus could not exhibit the level of capability we got with a GPT.</p></li><li><p>It can exhibit irregular flow control, like branching and jumping. In the &#8220;flying through forest&#8221; example above, it can do something like</p></li></ol><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">if left_path_too_narrow:
    take_right_path()
else:
    take_left_path()</code></pre></div><p>While this looks benign, it is deeply connected to the continuity clause of the UAT. An MLP cannot represent an algorithm that needs this kind of branching to have a discontinuous or symbolic jump.</p><p>In the example above, a square wave was still able to be approximated by an MLP, but at the expense of a large number of parameters. As a contrast, here&#8217;s an almost trivial program that could accomplish the requisite classification with very few parameters:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;f5665b4d-538b-4906-806e-4237d44f3842&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">if x mod 2 &lt; 1: # if the remainder of x/2 is &lt; 1
    return 1
else:
    return -1</code></pre></div><p>This shows the expressive power of a Turing machine compared to a lookup table. Adding a little structural or organizational complexity drastically reduced the number of required parameters.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h2>The Transformer Attention Mechanism</h2><p>We discussed earlier how the UAT only addresses a finite set of inputs. This is true in practice for MLPs as well: it will typically be used to process a fixed image size, or in an transformer feedforward network, a fixed layer width.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><p>The attention mechanism of transformers is different. In an LLM, when a sequence of tokens is fed in, each token can attend to each other token, enabling a computation paradigm that can handle sequences of arbitrary length. This makes it different from a lookup table, because the input <em>dimension itself is unbounded.</em> You don&#8217;t need to retrain for longer sequences since the attention mechanism adapts the algorithm.</p><p>In practical terms, a transformer&#8217;s sequence length has to be limited to a maximum context length to manage the mapping to computational hardware. By the same token, CPUs also needed unbounded memory to be true Turing machines.</p><p>So, are implementable transformers, like general purpose CPU programs, Turing machines in all but the most pedantic terms?</p><p>Not quite &#8212; there&#8217;s still a fundamental gap that cannot be closed. Transformers are still continuous function approximators and cannot efficiently exhibit irregular flow control. A <a href="https://arxiv.org/pdf/2602.11175">2026 paper from Oracle AI</a> looks at discrete reasoning with transformers, and I&#8217;ll let it speak for itself:</p><blockquote><p>Through this synthesis, we provide readers with a cohesive understanding of why transformers succeed in interpolation tasks (e.g. summarization) but fall short in reliably executing symbolic algorithms.</p></blockquote><p>Symbolic algorithms are characterized by discontinuous outputs that present a challenge to transformers. Like in the square wave example above, you can try to circumvent the issue by increasing model width or dataset size, but this comes at the cost of greatly increased model size and inefficiency. Moreover, as the paper points out, as you compose symbolic tasks (task A &#8594; task B &#8594; &#8230;) the number of switching boundaries grows combinatorially.</p><p>For an LLM to change its mind on a yes / no answer, architecturally it needs to continuously interpolate through reasoning trajectories, traversed by generating (lots of) reasoning tokens.</p><h2>Closing Thoughts</h2><p>Deep neural networks can solve a huge variety of problems, founded on their universal function approximation ability. Transformers&#8217; ability to process arbitrary sequences advances them into a new computational category beyond lookup tables.</p><p>However, they are still not well suited to problems with symbolic or discontinuous outputs. This is common in problems to do with safety or symbolic reasoning. In current successes of deep learning, solutions to these kinds of problems are attained in a similar fashion as the square wave approximation above &#8212; it works, but is extremely inefficient.</p><p>These problems could potentially be solved with much smaller models if they had Turing machine-style universal computation capabilities. <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Devansh&quot;,&quot;id&quot;:8101724,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48081c70-8afa-41e3-a44e-b0f917bc7577_1200x1600.jpeg&quot;,&quot;uuid&quot;:&quot;d12070ea-e64f-4be9-9406-3b5a437c91d8&quot;}" data-component-name="MentionToDOM"></span>&#8217;s article linked below advocates for the same thing, approaching it from the computational hardware perspective for some classes of problems. In a follow up post, I&#8217;ll tie together the first-principles analysis in this post to current computational hardware, to discuss how different algorithm classes effectively map.</p><p>Thanks for reading!</p><p><em>If you enjoyed this post, please like (&#10084;&#65039;) and restack &#8212; it helps others find my writing. Subscribe to receive new posts. All of this is greatly appreciated.</em></p><div data-component-name="FragmentNodeToDOM"><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/how-an-llm-changes-its-mind/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/how-an-llm-changes-its-mind/comments"><span>Leave a comment</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p></div><h2>References and Further Reading</h2><p><a href="https://lifeiscomputation.com/transformers-are-not-turing-complete/">Are Transformers Turing-complete?</a> &#8212; Hessam Akhlaghpour (2024)</p><p><a href="https://arxiv.org/pdf/2602.11175">Barriers to Discrete Reasoning with Transformers</a> &#8212; Oracle AI (2026)</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:166288637,&quot;url&quot;:&quot;https://www.artificialintelligencemadesimple.com/p/the-great-compute-re-architecture&quot;,&quot;publication_id&quot;:1315074,&quot;publication_name&quot;:&quot;Artificial Intelligence Made Simple&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Pfon!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77504fa0-0f08-4a38-bbde-becb151d2db8_643x644.png&quot;,&quot;title&quot;:&quot;The Great Compute Re-Architecture: Why Branching &amp; Sparsity Will Define the Next Decade of Silicon [Breakdowns]&quot;,&quot;truncated_body_text&quot;:&quot;It takes time to create work that&#8217;s clear, independent, and genuinely useful. If you&#8217;ve found value in this newsletter, consider becoming a paid subscriber. It helps me dive deeper into research, reach more people, stay free from ads/hidden agendas, and supports my crippling chocolate milk addiction.&quot;,&quot;date&quot;:&quot;2025-06-19T01:36:32.021Z&quot;,&quot;like_count&quot;:57,&quot;comment_count&quot;:17,&quot;bylines&quot;:[{&quot;id&quot;:8101724,&quot;name&quot;:&quot;Devansh&quot;,&quot;handle&quot;:&quot;chocolatemilkcultleader&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48081c70-8afa-41e3-a44e-b0f917bc7577_1200x1600.jpeg&quot;,&quot;bio&quot;:&quot;The best meme-maker in Tech. Writer on AI, Software, and the Tech Industry. Currently in NYC Come say hi, I want more friends. &quot;,&quot;profile_set_up_at&quot;:&quot;2021-08-21T20:28:53.612Z&quot;,&quot;reader_installed_at&quot;:&quot;2024-03-11T12:27:10.271Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:1274217,&quot;user_id&quot;:8101724,&quot;publication_id&quot;:1315074,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:true,&quot;publication&quot;:{&quot;id&quot;:1315074,&quot;name&quot;:&quot;Artificial Intelligence Made Simple&quot;,&quot;subdomain&quot;:&quot;artificialintelligencemadesimple&quot;,&quot;custom_domain&quot;:&quot;www.artificialintelligencemadesimple.com&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Covering the important ideas in AI from all angles- technical, social, and economic. Read in over 200 countries.  Useful to everyone who wants to learn AI. Critical to anyone trying to see what happens next. Sister Publication to Tech Made Simple.&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77504fa0-0f08-4a38-bbde-becb151d2db8_643x644.png&quot;,&quot;author_id&quot;:8101724,&quot;primary_user_id&quot;:8101724,&quot;theme_var_background_pop&quot;:&quot;#009B50&quot;,&quot;created_at&quot;:&quot;2023-01-14T23:37:24.692Z&quot;,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Devansh&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;magaziney&quot;,&quot;is_personal_mode&quot;:false,&quot;logo_url_wide&quot;:null}},{&quot;id&quot;:109622,&quot;user_id&quot;:8101724,&quot;publication_id&quot;:108704,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:108704,&quot;name&quot;:&quot;Technology Made Simple&quot;,&quot;subdomain&quot;:&quot;codinginterviewsmadesimple&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Deep yet digestible insights about Computer Science, Programming Interviews, Software Engineering Careers, Machine Learning, and the Tech Industry for Tech Leaders. Amazing For Coders and Managers. Beneficial to anyone trying to make money in Tech. &quot;,&quot;logo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/8546dc69-af46-4d5d-9a80-b66cb76c833b_644x644.png&quot;,&quot;author_id&quot;:8101724,&quot;primary_user_id&quot;:null,&quot;theme_var_background_pop&quot;:&quot;#45D800&quot;,&quot;created_at&quot;:&quot;2020-10-07T10:47:41.199Z&quot;,&quot;email_from_name&quot;:&quot;Devansh from Tech Made Simple&quot;,&quot;copyright&quot;:&quot;Devansh&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;magaziney&quot;,&quot;is_personal_mode&quot;:false,&quot;logo_url_wide&quot;:null}},{&quot;id&quot;:5366623,&quot;user_id&quot;:8101724,&quot;publication_id&quot;:5261101,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:5261101,&quot;name&quot;:&quot;What's Happening In Tech&quot;,&quot;subdomain&quot;:&quot;whatishappeningintechnology&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;A Newsletter meant to Help People Keep Up With What's Happening in Tech&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff955b89-d08e-4cb7-8add-709e6dc14d8e_1080x1080.jpeg&quot;,&quot;author_id&quot;:8101724,&quot;primary_user_id&quot;:null,&quot;theme_var_background_pop&quot;:&quot;#FF6719&quot;,&quot;created_at&quot;:&quot;2025-06-07T04:30:33.908Z&quot;,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Devansh&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;newspaper&quot;,&quot;is_personal_mode&quot;:false,&quot;logo_url_wide&quot;:null}}],&quot;twitter_screen_name&quot;:&quot;Machine01776819&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:1000,&quot;status&quot;:{&quot;bestsellerTier&quot;:1000,&quot;subscriberTier&quot;:1,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:{&quot;type&quot;:&quot;bestseller&quot;,&quot;tier&quot;:1000},&quot;paidPublicationIds&quot;:[618139,1238074,1442076],&quot;subscriber&quot;:null}}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://www.artificialintelligencemadesimple.com/p/the-great-compute-re-architecture?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!Pfon!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77504fa0-0f08-4a38-bbde-becb151d2db8_643x644.png" loading="lazy"><span class="embedded-post-publication-name">Artificial Intelligence Made Simple</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">The Great Compute Re-Architecture: Why Branching &amp; Sparsity Will Define the Next Decade of Silicon [Breakdowns]</div></div><div class="embedded-post-body">It takes time to create work that&#8217;s clear, independent, and genuinely useful. If you&#8217;ve found value in this newsletter, consider becoming a paid subscriber. It helps me dive deeper into research, reach more people, stay free from ads/hidden agendas, and supports my crippling chocolate milk addiction&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">a year ago &#183; 57 likes &#183; 17 comments &#183; Devansh</div></a></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>The plots are generated from <a href="https://avikde.github.io/tiny-xpu/">this page</a> from the TinyXPU project, which you can read more about <a href="https://chipinsights.net/p/the-art-of-architectural-analysis">here</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>For a gentle introduction to transformers with a computer architecture framing, I&#8217;d recommend <a href="https://www.viksnewsletter.com/p/a-primer-on-transformer-architecture">Vik&#8217;s article</a>.</p></div></div>]]></content:encoded></item><item><title><![CDATA[The First Paradigm in Robotics & AI Research: Lessons from Computer Engineering]]></title><description><![CDATA[Commoditization and end-to-end learning have consolidated robotics and AI. What's next for research labs?]]></description><link>https://www.avikde.me/p/the-first-paradigm-in-robotics-and</link><guid isPermaLink="false">https://www.avikde.me/p/the-first-paradigm-in-robotics-and</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Wed, 29 Apr 2026 15:13:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4P2l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Revolutions">Thomas Kuhn wrote</a> that scientific fields develop into dominant <em>paradigms</em> that characterize phases of productive but incremental research. The very existence of a paradigm is evidence to the maturation of a field.</p><p>For robotics, we may be in the midst of the first time this has ever happened.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> The start of our research careers resembled the &#8220;wild west&#8221; of emerging techniques and technologies, but ideas have converged more now. On one hand, robotic hardware has gotten good enough to see thousands of robots of getting shipped and used, by consumers and researchers alike. On the algorithm side, the bitter lesson and its corollary &#8212; hypothesized &#8220;scaling laws&#8221; &#8212; have provided a scaffolding around which progress can be evaluated. <a href="https://itcanthink.substack.com/p/vision-language-action-models-and">End-to-end behavior cloning policies</a> seem like they can generalize to all sorts of tasks, and performance predictably improves with more data. We&#8217;ll refer to these two trends as <em>commoditization</em> and <em>architectural convergence</em>, and discuss how they shape the current paradigm below.</p><p>The establishment of this current paradigm has also had side-effects on the nature of research that may in themselves be setting us up for paradigm <em>shifts</em>. While it is a bit of an overreach to use the term &#8220;revolution&#8221; for robotics (as Kuhn did for science), such a shift would be pivotal for researchers and is worth understanding.</p><p><em>This article is co-written by </em><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Avik De&quot;,&quot;id&quot;:356074997,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;uuid&quot;:&quot;74d7f96b-7849-4218-ad74-d6ae4e18d101&quot;}" data-component-name="MentionToDOM"></span> <em>and </em><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Chris Paxton&quot;,&quot;id&quot;:232680664,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!13Dp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a886fd-347d-4694-b670-0253975d2ba9_659x547.png&quot;,&quot;uuid&quot;:&quot;f2090487-de58-433d-99ed-65a4350be474&quot;}" data-component-name="MentionToDOM"></span><em>, both robotics researchers with experience in academia as well as industry. Chris writes about AI and robotics, and Avik writes about robotics, computing, and AI.</em></p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:7287367,&quot;name&quot;:&quot;min{power}&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;base_url&quot;:&quot;https://www.avikde.me&quot;,&quot;hero_text&quot;:&quot;Explorations in computing and robotics focused on power-efficiency and safety -- personal posts by Avik De, robotics Ph.D. and founder&quot;,&quot;author_name&quot;:&quot;Avik De&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#ffffff&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://www.avikde.me?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png" width="56" height="56" style="background-color: rgb(255, 255, 255);"><span class="embedded-publication-name">min{power}</span><div class="embedded-publication-hero-text">Explorations in computing and robotics focused on power-efficiency and safety -- personal posts by Avik De, robotics Ph.D. and founder</div><div class="embedded-publication-author-name">By Avik De</div></a><form class="embedded-publication-subscribe" method="GET" action="https://www.avikde.me/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:2883266,&quot;name&quot;:&quot;It Can Think!&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!13Dp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a886fd-347d-4694-b670-0253975d2ba9_659x547.png&quot;,&quot;base_url&quot;:&quot;https://itcanthink.substack.com&quot;,&quot;hero_text&quot;:&quot;Robotics and AI; the future we're building and how we'll get there&quot;,&quot;author_name&quot;:&quot;Chris Paxton&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#292524&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://itcanthink.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!13Dp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a886fd-347d-4694-b670-0253975d2ba9_659x547.png" width="56" height="56" style="background-color: rgb(41, 37, 36);"><span class="embedded-publication-name">It Can Think!</span><div class="embedded-publication-hero-text">Robotics and AI; the future we're building and how we'll get there</div><div class="embedded-publication-author-name">By Chris Paxton</div></a><form class="embedded-publication-subscribe" method="GET" action="https://itcanthink.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><h2>Trends in Robotics and AI</h2><h3>1) Commoditization</h3><p>Going back to 2013, Avik&#8217;s Ph.D. research included the development of an internal research robot, Minitaur:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4P2l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4P2l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 424w, https://substackcdn.com/image/fetch/$s_!4P2l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 848w, https://substackcdn.com/image/fetch/$s_!4P2l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 1272w, https://substackcdn.com/image/fetch/$s_!4P2l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4P2l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png" width="515" height="366.31799163179915" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:680,&quot;width&quot;:956,&quot;resizeWidth&quot;:515,&quot;bytes&quot;:838745,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/194565767?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4P2l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 424w, https://substackcdn.com/image/fetch/$s_!4P2l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 848w, https://substackcdn.com/image/fetch/$s_!4P2l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 1272w, https://substackcdn.com/image/fetch/$s_!4P2l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c9a075c-f936-4270-8785-4812da0b85e4_956x680.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It took a lot of (Ph.D. student) effort to build the infrastructure, but resulted in a unique development platform that was easy to program (as an Arduino), lightweight and relatively safe (5 kg), and capable of producing very agile and exciting-looking behaviors. There was nothing like it that you could buy. All in all, this endeavor to develop a new robot led to <a href="https://www.avikde.me/p/vertical-hopper-compositions">papers</a>, cool movies to show in talks, and even <a href="https://www.ghostrobotics.io/">a startup company</a>.</p><p>In the decade after, four-legged robots started to get out of the research lab and into public consciousness. The show Silicon Valley had a <a href="https://www.businessinsider.com/silicon-valley-google-spot-robot-2016-4">Boston Dynamics Spot cameo in 2016</a>, and robot videos designed to appeal to a broad audience, like <a href="https://www.youtube.com/watch?v=kHBcVlqpvZ8">dancing</a>, started to appear. Four-legged robots were officially out of the lab and in the wild, and this led to increased expectations for what they should do. Stably walking around used to be cutting edge, but became table stakes. Expectations for specs such as reliability, battery life, compute capability, ruggedness drove  designs to be more complex. It became much more difficult for a couple of researchers with minimal engineering experience to put together a new robot. Moreover, after Chinese company Unitree entered the market and <a href="https://kr-asia.com/unitree-robotics-develops-personal-robot-dogs-that-jog-alongside-you">dropped the asking price by almost 30x</a> in 2021, it became not worth the time and dollars to even try.</p><p><strong>The pre-paradigm period of lab-developed robotic hardware is being replaced by algorithm development for commoditized hardware.</strong></p><p>We have seen this play out in several robotics research labs. DJI commoditized consumer drones aggressively from 2013 onward, making it hard to justify custom builds even for capability reasons. By the mid-2010s, labs doing serious flight research (e.g., <a href="https://rpg.ifi.uzh.ch/people_scaramuzza.html">Davide Scaramuzza&#8217;s group</a> at University of Zurich) were exclusively using commercial platforms. ETH Zurich&#8217;s <a href="https://rsl.ethz.ch/research/researchtopics/legged-locomotion.html">Robotic Systems Lab</a> (which built ANYmal originally, and also STarLETH) now deploys their locomotion research on the ANYmal platform rather than building new hardware. <a href="https://bostondynamics.com/blog/what-makes-an-effective-research-robot/">Boston Dynamics has an article</a> that talks about how commercial platforms let researchers hit the ground running.</p><p>Post-commoditization, researchers who want to demonstrate <em>algorithms</em> working on robots can reap the benefits. Humanoid research circa 2015 meant figuring out &#8220;how do we actually build these things and make them not fall over,&#8221; whereas post-commoditization, time can be spent on higher-level algorithms and methods &#8212; we refer to this phenomenon as &#8220;<strong>moving up the stack</strong>.&#8221;</p><p>A secondary effect of commoditization is that <em>parts</em> are now easier to get, and researchers can put together novel modular combinations of more mature components. The WidowX 250 Dynamixel-based arm from Trossen Robotics has become the default low-cost manipulation platform because it is cheap (~$3k) and can be used to create &#8220;leader-follower&#8221; setups for data collection. The <a href="https://arxiv.org/abs/2304.13705">ALOHA paper</a> notes that the whole system with two arms costs ~$20k off-the-shelf. More recently, we have seen <a href="https://yourownrobot.ai/">robots like the YOR</a> assembled from off-the-shelf parts for research purposes. This effect enables new types and form-factors of robots to be built &#8212; <em>we will return to this in the next section</em>.</p><p>The same trend applies to non-hardware <strong>AI research</strong>. Frontier language models cannot really be trained by academic research labs any more &#8212; research in these areas moves to fine-tuning commercial models instead. The following plots <a href="https://github.com/avikde/robo-research-trends">were generated from arXiv data</a> and confirm these trends toward pretrained model usage in research compared to building them from scratch.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ry1_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ry1_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 424w, https://substackcdn.com/image/fetch/$s_!Ry1_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 848w, https://substackcdn.com/image/fetch/$s_!Ry1_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 1272w, https://substackcdn.com/image/fetch/$s_!Ry1_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ry1_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png" width="384" height="288" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:600,&quot;resizeWidth&quot;:384,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ry1_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 424w, https://substackcdn.com/image/fetch/$s_!Ry1_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 848w, https://substackcdn.com/image/fetch/$s_!Ry1_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 1272w, https://substackcdn.com/image/fetch/$s_!Ry1_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d0c81fa-2963-4f6d-aa12-9f030a92a603_600x450.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VfeU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VfeU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 424w, https://substackcdn.com/image/fetch/$s_!VfeU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 848w, https://substackcdn.com/image/fetch/$s_!VfeU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 1272w, https://substackcdn.com/image/fetch/$s_!VfeU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VfeU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png" width="388" height="291" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:600,&quot;resizeWidth&quot;:388,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VfeU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 424w, https://substackcdn.com/image/fetch/$s_!VfeU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 848w, https://substackcdn.com/image/fetch/$s_!VfeU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 1272w, https://substackcdn.com/image/fetch/$s_!VfeU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f67c01-9553-4d2c-9903-15f7106bddea_600x450.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The Qwen series of models by Alibaba have nearly taken over the research world by facilitating fine-tuning. In 2026, no academics would think of training their own language models or even vision-language models from scratch &#8212; why would you, when Qwen 3.5 can already beat anything that&#8217;s within reach of an ordinary academic lab?</p><p>Just like for robotics hardware, <strong>the pre-paradigm period of lab-developed models is being replaced by fine-tuning commercial models</strong>.</p><p>Here as well, there are research ideas which can be pursued by <strong>moving up the stack</strong>: agentic reasoning, reinforcement learning, world representations, novel model architectures, etc. Robotics models are not like language models; there are fewer real world benchmarks and it seems that even within the domain of end-to-end deep learning there are plenty of ideas left unexplored.</p><h3>2) Architectural Convergence</h3><p>Labs used to have a narrower focus where they could carve their niche, e.g. computer vision, legged locomotion, etc. However, for a robot to demonstrate complex sensorimotor tasks, you need <a href="https://open.substack.com/pub/minpower/p/the-architecture-behind-end-to-end">all of the Sense-Plan-Act functions implemented in some way</a>. If you subscribe to the bitter lesson, even the best computer vision algorithm, when connected using hand-crafted interfaces to a planner and other downstream systems, cannot compete with end-to-end systems. General-purpose manipulation / locomotion research is <a href="https://itcanthink.substack.com/p/interesting-directions-in-vision">converging on behavior cloning and VLAs</a> since it works well enough across many tasks, and performance improves with larger models and more data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qAhH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qAhH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 424w, https://substackcdn.com/image/fetch/$s_!qAhH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 848w, https://substackcdn.com/image/fetch/$s_!qAhH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 1272w, https://substackcdn.com/image/fetch/$s_!qAhH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qAhH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png" width="1456" height="934" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:934,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qAhH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 424w, https://substackcdn.com/image/fetch/$s_!qAhH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 848w, https://substackcdn.com/image/fetch/$s_!qAhH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 1272w, https://substackcdn.com/image/fetch/$s_!qAhH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5518ef64-6a10-4f1d-8017-ddd845e0988b_1472x944.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Behavior cloning with VLAs (source: <a href="https://www.pi.website/research/human_to_robot">Physical Intelligence</a>)</figcaption></figure></div><p>This trend has pushed many previously-diverse labs toward developing end-to-end models, which is a significant reduction in the diversity and richness in the research ecosystem. For better or worse, we appear to solidly be in a <strong>paradigm of behavior cloning with end-to-end models</strong>.</p><p>This has several benefits for researchers: they can build on existing work easily without re-inventing the wheel, and it creates a scaffolding for new contributions. However, it also has the side-effect of suppressing other schools of thought. In Kuhn&#8217;s somewhat ominous words,</p><blockquote><p>But there are always some men who cling to one or another of the older views, and they are simply read out of the profession, which thereafter ignores their work. The new paradigm implies a new and more rigid definition of the field. Those unwilling or unable to accommodate their work to it must proceed in isolation or attach themselves to some other group.</p></blockquote><p>How do research labs and out-of-paradigm ideas stand out in the face of homogenization and consolidation in this paradigm? We discuss what we can learn from computer engineering in the next section.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h2>What we can learn from Computer Engineering</h2><p>By necessity, computer engineering has always been a bit ahead of the same technology curve as robotics. After all, we needed the chips to facilitate computations needed for robots to work.</p><p>We saw there a similar <strong>commoditization</strong> trend, with hardware complexity outgrowing what a research lab could build. The initial university fab era was anchored by <a href="https://en.wikipedia.org/wiki/VLSI_Project">DARPA&#8217;s VLSI Project</a>, which produced BSD Unix, the RISC concept, and MOSIS (a shared fab for academia). Once that era ended, academic research pivoted to what could be done without a fab.</p><p>As a response, computer engineering therefore shows a good set of examples of <strong>moving up the stack</strong> (transistors &#8594; meta-design tools and ISAs). Circa 2010, rather than building chips, Krste Asanovi&#263;&#8217;s group at Berkeley <a href="https://people.eecs.berkeley.edu/~krste/papers/EECS-2014-146.pdf">designed the open RISC-V ISA</a> explicitly motivated by the problem of proprietary architectures impeding academic research. With <a href="https://github.com/chipsalliance/chisel">Chisel</a> (Berkeley), academics built better tools for designing chips, by expressing hardware designs in a high-level language, and it became the foundation for most RISC-V implementations.</p><p>In addition, CPU architectures converged to x86 for desktop and ARM for mobile because they worked well enough for most workloads, and design costs could be amortized across different applications &#8212; a <strong>general-purpose computing paradigm</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ncaw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ncaw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 424w, https://substackcdn.com/image/fetch/$s_!Ncaw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 848w, https://substackcdn.com/image/fetch/$s_!Ncaw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 1272w, https://substackcdn.com/image/fetch/$s_!Ncaw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ncaw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png" width="583" height="393.6105675146771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:690,&quot;width&quot;:1022,&quot;resizeWidth&quot;:583,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ncaw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 424w, https://substackcdn.com/image/fetch/$s_!Ncaw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 848w, https://substackcdn.com/image/fetch/$s_!Ncaw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 1272w, https://substackcdn.com/image/fetch/$s_!Ncaw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19335b79-16cb-45ea-89b5-3274e08984e3_1022x690.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Performance saturation from the end of Dennard scaling (source: H&amp;P 2017 lecture).</figcaption></figure></div><p><a href="https://dl.acm.org/doi/10.1145/3282307">Hennessy and Patterson&#8217;s 2017 Turing Award lecture</a> argued that the post-Dennard-scaling era opens up a new window for research in domain-specific accelerators, where the design space is exploratory again. Coincident with the success of deep neural networks, the last few years have seen a <a href="https://thechipletter.substack.com/p/ai-accelerators-the-cambrian-explosion">Cambrian explosion in AI accelerators</a>, ushering in much more innovation in computer architecture and silicon than was possible in CPUs.</p><p>In other words, computer engineering&#8217;s paradigm shift resulted in <strong>domain-specific diversification.</strong></p><p>How do these apply to robotics and AI?</p><p>Just as chip fabrication leaving academia didn&#8217;t end computer architecture research, robotics research will find a home in core algorithms, training methodologies, and novel architectures<strong>.</strong> While papers can continue to be written on new methods and algorithms, unfortunately, the flashy demonstrations (important for fundraising and PR) may go out of lab reach. Similar to how ChatGPT capitalized on published transformer research, companies will capitalize on published public-domain research. It may become crucial to have a credit mechanism for academics for commercial usage of their work (this is not covered by academic metrics such as h-index).</p><p>The largest robotics companies are converging on general-purpose humanoids, optimizing for the broadest possible applicability and commercial value. By analogy to computer engineering&#8217;s <strong>domain-specific diversification</strong>, the next productive frontier for academic labs may be task-specific robots: surgical, agricultural, soft robots, etc., which diverge enough from general-purpose designs to make bespoke solutions worthwhile. A positive side-effect of the commoditization of hardware components (like actuators, IMUs, perception systems like the Kinect) all come together to facilitate this kind of development.</p><h2>The Future</h2><p>While the external perception of robotics and AI research is that we are undergoing a revolution today, the internal view is more consistent with <em>commoditization</em> and <em>convergence</em>. This paradigm has had a lot of positive side-effects, like establishing a framework and shared infrastructure, but also some serious downsides, like stifling research that doesn&#8217;t fit the mold. </p><p>In response, we already see the reality of robotics research <strong>moving up the stack</strong>, and we will potentially begin to see examples of <strong>domain-specific diversification</strong> if the largest companies with the largest datasets corner the end-to-end behavior cloning approach.</p><p>Beyond that, it&#8217;s too early to predict if there is a paradigm shift coming. Kuhn says on this topic:</p><blockquote><p>Sometimes a normal problem, one that ought to be solvable by known rules and procedures, resists the reiterated onslaught of the ablest members of the group within whose competence it falls. On other occasions a piece of equipment designed and constructed for the purpose of normal research fails to perform in the anticipated manner, revealing an anomaly that cannot, despite repeated effort, be aligned with professional expectation.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YrAc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YrAc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 424w, https://substackcdn.com/image/fetch/$s_!YrAc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 848w, https://substackcdn.com/image/fetch/$s_!YrAc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 1272w, https://substackcdn.com/image/fetch/$s_!YrAc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YrAc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png" width="233" height="229" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:229,&quot;width&quot;:233,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YrAc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 424w, https://substackcdn.com/image/fetch/$s_!YrAc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 848w, https://substackcdn.com/image/fetch/$s_!YrAc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 1272w, https://substackcdn.com/image/fetch/$s_!YrAc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116dcd4e-6c38-4f02-88c6-c2b237be5d36_233x229.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Kuhn cycle (<a href="https://www.thwink.org/sustain/glossary/KuhnCycle.htm">source</a>)</figcaption></figure></div><p>Will there be a &#8220;piece of equipment&#8221; or &#8220;normal problem&#8221; whose unexpected result paves the way for the next robotics revolution? Optimistically, it seems like the current paradigm still has legs for a little while longer, but there is already work at the fringes looking toward the next set of leaps, like world model research, neuromorphic computing, etc. We&#8217;ll be writing about these topics over the coming weeks and months; stay tuned!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><p><em>If you enjoyed this post, please like (&#10084;&#65039;) and restack &#8212; it helps others find my writing. Subscribe to receive new posts. All of this is greatly appreciated.</em></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>While originally intended for scientific fields, the <a href="https://www.sciencedirect.com/science/article/abs/pii/0048733382900166">idea has been extended</a> to broader technological fields.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Honor's humanoid ran the fastest half-marathon: how did they do it?]]></title><description><![CDATA[Engineering isn't magic, it's a matter of tradeoffs]]></description><link>https://www.avikde.me/p/honors-humanoid-ran-the-fastest-half</link><guid isPermaLink="false">https://www.avikde.me/p/honors-humanoid-ran-the-fastest-half</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Wed, 22 Apr 2026 20:11:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!S69N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Robotics headlines over the past week have been dominated by the news that the <a href="https://www.cnn.com/2026/04/19/china/china-robot-half-marathon-intl-hnk">Honor Lightning humanoid robot has beaten the human half marathon world record</a> for the first time. It&#8217;s important to remember that machines and humans have very different capabilities and constraints, so why should we ever have expected the half marathon time for a robot and human to be related? Down the line, I don&#8217;t expect this particular comparison of human to machine to be very relevant. Nevertheless, it&#8217;s still an important milestone for engineering, just like <a href="https://en.wikipedia.org/wiki/Deep_Blue_versus_Garry_Kasparov">Deep Blue&#8217;s 1997 defeat of Garry Kasparov in chess</a>. From a human standpoint, I hope we can resist comparing the accomplishments of machines to the well-earned and deserved achievements of humans&#8230; maybe the chess model is a reasonable one here. Also as in the chess case, where Deep Blue couldn&#8217;t physically move the pieces, the Honor robot&#8217;s capabilities are much more narrow than a human running elbow-to-elbow with other runners, effortlessly navigating the course without GPS, etc. Comparing the robot runner to a human runner is just an apples to oranges comparison.</p><p>What <em>is</em> a good comparison is this performance to last year&#8217;s, when the best robot time was over 160 minutes, or more than 3x this year&#8217;s time. That&#8217;s a remarkable improvement in one year. My doctoral thesis involved <a href="https://www.avikde.me/p/phd-defense">building and controlling hopping and running robots</a>, and <a href="https://www.avikde.me/p/ghost-robotics-minitaur">since then I&#8217;ve tried to design and build efficient commercial legged robots</a>, giving me a decent idea of the constraints involved. So, in this article I wanted to try and examine &#8212; how did they do it? Is there some magical technology or technique that unlocked this performance? How did they beat the significantly better-known Unitree (who reportedly had to supply an <a href="https://x.com/TheHumanoidHub/status/2045702643449037287">ice pack backpack</a> to try and complete the race without overheating)? Could a western robot have won?</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><em>This publication and this post contain the author&#8217;s personal thoughts and opinions only, and do not reflect the views of any companies or institutions.</em></p><h2>The basic physics of hopping and running</h2><p>Hopping, very simply, consists of alternating phases of a leg pushing against the ground (&#8220;stance phase&#8221;) and the body flying through the air (&#8220;aerial phase&#8221;).</p><p>In aerial phase, the body simply free-falls (constant acceleration due to gravity). You can think of this as losing vertical momentum. In stance phase, the job of the leg is to push against the ground to reverse this vertical momentum. The job of the &#8220;knee&#8221; actuator is primarily to generate this force in stance phase.</p><p>The other basic leg function is repositioning for the next foothold. In bipedal running, while one leg is pushing against the ground, the other leg is swinging to reposition for the next step. The job of the &#8220;hip&#8221; actuator is primarily to swing the leg forward.</p><p>Bipedal running is simply these two functions alternating in the two legs &#8212; while the left leg pushes against the ground, the right leg swings forward, and vice versa. Of course, this is an oversimplification in many ways, but it still captures the main effects that contribute to running energetics. Namely, it becomes clear that:</p><ul><li><p>the knee actuator must produce enough torque to reverse the entire robot momentum in the stance duration <em>T<sub>s</sub></em></p></li><li><p>the hip actuator must product enough power to accelerate the leg forward in the swing duration <em>T<sub>sw</sub></em></p></li></ul><p>The way a robot runs faster is that it increases its stride length and/or shortens the stance duration.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RCSB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RCSB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 424w, https://substackcdn.com/image/fetch/$s_!RCSB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 848w, https://substackcdn.com/image/fetch/$s_!RCSB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 1272w, https://substackcdn.com/image/fetch/$s_!RCSB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RCSB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png" width="494" height="277.5809523809524" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:354,&quot;width&quot;:630,&quot;resizeWidth&quot;:494,&quot;bytes&quot;:44169,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/194835386?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RCSB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 424w, https://substackcdn.com/image/fetch/$s_!RCSB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 848w, https://substackcdn.com/image/fetch/$s_!RCSB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 1272w, https://substackcdn.com/image/fetch/$s_!RCSB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bb499d-9875-46ab-bb45-0d4e0f7c8955_630x354.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A depiction of a single-leg hopper&#8217;s stance phase showing the reversal of vertical momentum and the maintenance of horizontal momentum, as well as the stride length and the stance duration. Source: &#8220;Legged Robots That Balance&#8221;.</figcaption></figure></div><p>Shortening the stance duration requires a higher amount of knee torque to be needed to accomplish the same momentum reversal. Swinging the leg faster, and covering a longer stride length requires more torque and power from the hip actuator.</p><p>And just like that, with very basic physics, we&#8217;ve recovered the dependence of running speed on the torque and power produced by the actuators.</p><h2>The basic physics of motors</h2><p>Electric motors dissipate energy in an exact relation to the amount of torque they produce, and these quantities are related by an appropriately-named constant termed the <em>motor constant</em>, <em>K<sub>m</sub></em>. If <em>&#964; </em>is the torque produced by the motor and <em>Q</em> is the heat it produces,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;K_m := \\frac{\\tau}{\\sqrt{Q}}&quot;,&quot;id&quot;:&quot;FTAAGSQRRL&quot;}" data-component-name="LatexBlockToDOM"></div><p>In the &#8220;New Motor Models&#8221; section in <a href="https://repository.upenn.edu/entities/publication/10b266fd-41d2-49b6-ac90-0ee614bca00a">my thesis (2017)</a> I described how a <em>K<sub>m</sub></em> scaling relation can be approximated from rough first-principles geometry arguments. In particular, for a fixed length scale, <em>K<sub>m</sub></em> scales with the square root of motor mass &#8730;<em>m</em>. In a <a href="https://robot-daycare.com/posts/actuation_series_1/">recent post</a>, longtime blogger and roboticist Ben Katz generalizes and gives this coefficient a name , the &#8220;figure of merit (FoM),&#8221; which we can use here:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathrm{FoM}:=\\frac{K_m}{r\\sqrt{m}}&quot;,&quot;id&quot;:&quot;ICOKXYEZKV&quot;}" data-component-name="LatexBlockToDOM"></div><p>The <em>r</em> above is the motor diameter. To estimate motor mass <em>m</em>, I decided to relate it to the motor diameter and (unknown) length. With these, and assuming a high but reasonable FoM of 15, we can extrapolate the likely <em>K<sub>m</sub>.</em></p><p>To estimate the rotor inertia, we can relate it to the motor mass and inertia as <em>j ~ mr<sup>2</sup></em> as Ben Katz also does.</p><p>Adding a geartrain (gear ratio <em>G</em>) after the motor amplifies its torque and reduces its speed by <em>G</em>. So, it helps with torque production, but it has a very deleterious effect in legged systems when accelerating. Since the rotor of the motor itself has to spin faster, the rotor inertia <em>j</em> in the output frame appears scaled to <em>G<sup>2</sup>j</em>, which can quickly become very large. Thus, a small motor with large gearing becomes very sluggish at accelerating its output, even if it can statically produce a large torque. This is obviously bad for the &#8220;swing phase&#8221; described above.</p><h2>The Honor Lightning&#8217;s technology</h2><p>There isn&#8217;t a technical report on this robot as far as I know, but some online articles list a few specifications. I referred to <a href="https://chinaresearchcollective.substack.com/p/honors-autonomous-humanoid-robot">this substack article</a> for this post. A couple of notes:</p><ul><li><p>This article and a few others say that the robot has 55 joints, but that is definitely a mistake. Potentially with hands (that were not equipped on these half-marathon versions) it could have 55 joints, but as deployed, they probably had closer to half as many joints.</p></li><li><p>The page also lists &#8220;Leaderdrive&#8221; as a harmonic reducer technology partner implying that strain wave gearing was used. However, based on the analysis below, a lower reduction-ratio planetary or another type of gearing is more appropriate, especially for this kind of efficiency-critical application.</p></li></ul><h3>Actuation: motor, gearing, gait</h3><p>These three factors are all interrelated and have an effect on how much energy is required and how much heat is produced. To see how, let&#8217;s start with the motor.</p><p>Typically, the motor <em>K<sub>m</sub></em> can be found in the datasheet, but in this case there&#8217;s no public reporting on the motor specs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S69N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S69N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 424w, https://substackcdn.com/image/fetch/$s_!S69N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 848w, https://substackcdn.com/image/fetch/$s_!S69N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 1272w, https://substackcdn.com/image/fetch/$s_!S69N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S69N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png" width="860" height="573" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:573,&quot;width&quot;:860,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:683697,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/194835386?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!S69N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 424w, https://substackcdn.com/image/fetch/$s_!S69N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 848w, https://substackcdn.com/image/fetch/$s_!S69N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 1272w, https://substackcdn.com/image/fetch/$s_!S69N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d76f0f-f7a7-4a83-bf75-89e56c57428b_860x573.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The Honor Lightning robot. Source: CNN.</figcaption></figure></div><p>However, we can see the size of the fairly large hip/knee motors attached to the upper leg (my rough estimation is that the outer diameter is somewhere between 110-150mm from the image above). We can look at a couple of potential options: first, a reasonable 115mm diameter catalog motor, which I chose from TQ&#8217;s frameless motor catalog for similar reasons to Ben Katz&#8217;s blog post &#8212; they are well-documented and have a large selection. Second, we can use the scaling principles to make some reasonably good approximations of <em>K<sub>m</sub></em> for a hypothetical larger motor. I extrapolated to a 150x25 sized motor to obtain a <em>K<sub>m</sub></em> of 1.52 Nm/sqrt(W), and a mass of almost 2 kg.</p><p>Since we don&#8217;t know the gear ratio, we can use our simple physics model (script linked in references below) to estimate the power consumption for running for the &#8220;small&#8221; and &#8220;big&#8221; motors above as a function of <em>G</em>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DT76!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DT76!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 424w, https://substackcdn.com/image/fetch/$s_!DT76!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 848w, https://substackcdn.com/image/fetch/$s_!DT76!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 1272w, https://substackcdn.com/image/fetch/$s_!DT76!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DT76!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png" width="380" height="314.25219941348973" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:564,&quot;width&quot;:682,&quot;resizeWidth&quot;:380,&quot;bytes&quot;:85346,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/194835386?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DT76!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 424w, https://substackcdn.com/image/fetch/$s_!DT76!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 848w, https://substackcdn.com/image/fetch/$s_!DT76!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 1272w, https://substackcdn.com/image/fetch/$s_!DT76!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f426974-4bd6-496a-9abb-5f2bf4720cf4_682x564.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Note that:</p><ul><li><p>A high gear ratio is nice to minimize the power in the knee actuator (since its job of supporting the robot weight is made easier with mechanical advantage), but a high gear ratio also makes the leg swing energetically difficult.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> There&#8217;s usually a middle-ground optimum.</p></li><li><p>The larger motor (150x25) prefers a smaller gear ratio (~23:1), and the smaller motor (115x25) prefers a higher gear ratio (~40:1).<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> These are shown with the dashed gray lines.</p></li><li><p>Both of these options seem to be able to accomplish the basic push-ground and leg-swing functions, with modest robot power consumption of 400-500 W.</p></li></ul><p>So, in sum, the <strong>motor is not magical technology</strong> and in fact, a range of existing or projected options would work, <em>when appropriately sized for this task.</em><strong> </strong>I&#8217;ll get back to this last bit and the green lines in the plot later.</p><p>The dissipated knee power (which is typically the main thermal limiting factor) is ~150W for both solutions. This is almost an unavoidable consequence &#8212; due to the predictable scaling of motor <em>K<sub>m</sub></em>, running at human speeds with a humanoid-sized robot will inevitably generate this amount of heat!</p><p>This, finally, is where we would see a potentially large difference between the two motors. Motor cooling is affected by the surface area over which heat removal can occur, and the larger motor has 70% more surface area. Even so, over a prolonged period, 150W is a large amount of power to dissipate from a single motor, and this is where one of the stated innovations in this robot design appear to be coming to bear (<a href="https://eu.36kr.com/en/p/3775418378027520">source</a>):</p><blockquote><p>According to Honor, the liquid - cooling pipes penetrate deep into the motors like capillaries. The high - power liquid pump has a heat - exchange flow rate of more than 4 liters per minute. Each of the four drive motors in the lower limbs is equipped with an independent liquid - cooling circuit.</p></blockquote><p>Liquid cooling is not new, but it&#8217;s definitely not what I would call a commodity. It has shown up in research periodically, and on the commercial side <a href="https://apptronik.com/news-collection/apptronik-readies-its-humanoid-robot-for-a-summer-unveil">Apptronik tried it for a few of their prototypes</a> but (to my knowledge) does not use it on their main Apollo platform. While it definitely is not magical technology, it has been niche so far. As described above, it is absolutely essential (and so far quite challenging) to be able to dissipate ~150W from a motor for running at these speeds. From that respect, the <strong>liquid cooling tech is a key enabler</strong> of this type of performance.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p><strong>Caveat: </strong>The script I used to generate the plots above makes a lot of simplifying approximations. It doesn&#8217;t capture the energy dissipated in other motors (arms, ankles, abduction, etc.). The basic physics principles don&#8217;t lie about the periodic center-of-mass behavior, but this doesn&#8217;t model other oscillations in the orientation as the body sways etc., or losses like friction or air resistance. The inertia of the leg is left out of the swing inertia calculation, since there is no way to approximate it properly with the information available. Published materials emphasize a lightweight leg construction, which indicates that the rotor reflected inertia will likely dominate it (and so the script&#8217;s approximation is likely good). There are more accurate ways to estimate the swing energetics incorporating the leg kinematics and swing trajectory, but I wanted to not increase the complexity of this analysis and chose to err on the side of simplicity. Still, I think the main estimates and talking points (motor / gearing selection for the knee motor, and power dissipated in it) can be trusted.</p><h3>AI and autonomy</h3><p>There&#8217;s nothing to write home about here. The gait controller could have used either a reinforcement learning (RL) controller, which is easy to train for flat ground, or a model-based controller. The autonomous navigation system used a provided GNSS system and just had to follow the route waypoints. This is all very well-understood technology.</p><h3>Battery</h3><p>Let&#8217;s assume that the battery was chosen to last 1.5 hrs (the robot finished in &lt; 1 hr). For 600 W consumption (based on the figures above with some buffer), the battery would have had to have 900 Wh capacity, and at 300 Wh/kg energy density, the pack would have weighed 3 kg. This is well within reason for a 45 kg robot. Additionally, a 1.5 hour discharge time indicates a 1/1.5 or 0.67C discharge, which is well within the ratings of most existing batteries.</p><p>The Unitree H1 reportedly needed &#8220;<a href="https://www.instagram.com/p/DXZV1x9DEAp/">pit stops</a>&#8221; and battery cooling ice, indicating that it was consuming much higher power. We&#8217;ll talk about that more next.</p><h2>Engineering always involves tradeoffs</h2><p>Engineering is always characterized by tradeoffs &#8212; that&#8217;s what makes it challenging but also fun. Especially today with ever-stronger AI language models, the very human skill of judgment and knowing how to made tradeoffs is much more important than the rote work of completing a design to spec.</p><p>Even with the very simple model above, it was not that complex to roughly design a drivetrain that is theoretically capable of this feat. Then why did the competitors in the race, including more <a href="https://www.forbes.com/sites/johnkoetsier/2026/01/09/top-10-humanoid-robot-companies-by-shipments-revealed/">established and widely-shipped humanoids</a> such as from Unitree or Agibot, not compete as well?</p><p>We can use the simple model to generate an equivalent energetics plot for walking at 1.5 m/s, a much more modest but potentially more common activity for a commercial humanoid robot:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Gxy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Gxy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 424w, https://substackcdn.com/image/fetch/$s_!5Gxy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 848w, https://substackcdn.com/image/fetch/$s_!5Gxy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 1272w, https://substackcdn.com/image/fetch/$s_!5Gxy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Gxy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png" width="422" height="351.45671641791046" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:558,&quot;width&quot;:670,&quot;resizeWidth&quot;:422,&quot;bytes&quot;:84359,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/194835386?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Gxy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 424w, https://substackcdn.com/image/fetch/$s_!5Gxy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 848w, https://substackcdn.com/image/fetch/$s_!5Gxy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 1272w, https://substackcdn.com/image/fetch/$s_!5Gxy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae713be7-bffb-4af2-871c-0433bdaaf6da_670x558.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The gray lines are as before &#8212; gear ratios optimized for half-marathon running. The green lines are where the power is minimized for walking, and they are significantly different!</p><p>Let&#8217;s say you design your robot to excel at the normal walking task and chose the green gear ratios. The knee motor power to run a half marathon with that green design consumes &gt; 300 W, more than 2x what we had with the running-optimized gray designs. It wouldn&#8217;t be so surprising to need ice packs!</p><p>Conversely, the running-optimized gray design, when used for the walking task, wastes significantly more motor power than the green designs (as seen from where they intersect the blue curves). We couldn&#8217;t model this effect with the information available, but using larger motors sized for running also increases the weight of the robot and constantly wastes power when it isn&#8217;t running at full speed. You can visually see the difference in motor sizes between the Unitree H1 and Honor Lightning:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kC_0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kC_0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 424w, https://substackcdn.com/image/fetch/$s_!kC_0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 848w, https://substackcdn.com/image/fetch/$s_!kC_0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 1272w, https://substackcdn.com/image/fetch/$s_!kC_0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kC_0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png" width="1456" height="1285" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1285,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2183576,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/194835386?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kC_0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 424w, https://substackcdn.com/image/fetch/$s_!kC_0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 848w, https://substackcdn.com/image/fetch/$s_!kC_0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 1272w, https://substackcdn.com/image/fetch/$s_!kC_0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe087350a-1321-41e6-b2e6-ac50405051e9_2012x1776.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The larger motors will have all sorts of practical (if not fundamental) consequences like bumping into objects while operating in homes or factories.</p><h2>Closing thoughts</h2><p>What should we conclude from Honor&#8217;s accomplishment? First, the capillary motor cooling solution, if mass manufacturable, is a genuine advance, and I suspect this running pace would not have been sustainable without it. Second, even if there wasn&#8217;t any &#8220;magic&#8221; needed, this was a really impressive engineering effort and result. For better or worse, it deserves to be a landmark tantamount to Deep Blue v. Kasparov.</p><p>Having said that, I don&#8217;t believe this says anything at all about human half-marathon performances. It doesn&#8217;t even imply that a humanoid robot could join a race among a sea of people without GPS and resiliently finish the race. I wish those comparisons would be left out of the press coverage.</p><p>Another thing I found interesting is that the Lightning robot was reportedly developed in about a year, between MWC in March 2025 and the April 2026 race. That is incredibly fast. However, what is even more stunning is that the R&amp;D team <a href="https://chinaresearchcollective.substack.com/p/honors-autonomous-humanoid-robot">reportedly had 2,600 people</a>. Comparing to a few US humanoid robot companies, to my knowledge, that eclipses the headcounts of Boston Dynamics, Figure, Agility, and Apptronik combined (I am not sure of Tesla&#8217;s Optimus-specific headcount). On top of that, you have to account for the partner and manufacturing ecosystem that was brought to bear, as reported by the same linked article.</p><p>Is all this worth it? It probably isn&#8217;t for most of these companies who need to spend their resources developing applications customers need and will pay for, but the cooling and weight-reduction advances may well be useful for more practical purposes like carrying heavy payloads down the line.</p><p><em>If you enjoyed this post, please like (&#10084;&#65039;) and restack &#8212; it helps others find my writing. Subscribe to receive new posts. All of this is greatly appreciated.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h2>References</h2><ul><li><p><a href="https://gist.github.com/avikde/496d108195a040763fd9b610f870d071">Script used for power estimates</a> (Github gist)</p></li><li><p><a href="https://docs.google.com/spreadsheets/d/1spBdXsc9IK0wgs-ISgCVRF1hi4WsSuF2xuNKQCzFoPk/edit?gid=0#gid=0">Spreadsheet with motor parameters and estimates</a></p></li></ul><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>This simplification makes it seem like one could just have a heavily geared knee motor and a lightly geared hip motor then, but this breaks down when you actually consider the full leg kinematics. Many of the leg joints participate in force production and swing, and one isn&#8217;t isolated to the knee motor like our cartoon might suggest. Additionally, a photo of the Honor robot really suggests that the hip and knee motors are similar if not identical. For the level of detail (and guesswork) of this article, we must assume that they are the same.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>The larger motor will also make the whole robot heavier, but we don&#8217;t have sufficient information to predict how exactly so we have to ignore this effect</p></div></div>]]></content:encoded></item><item><title><![CDATA[Building a reasoning hierarchical robotics pipeline from scratch]]></title><description><![CDATA[Part 5: A demo combining the best features of end-to-end and classical approaches]]></description><link>https://www.avikde.me/p/building-a-reasoning-hierarchical</link><guid isPermaLink="false">https://www.avikde.me/p/building-a-reasoning-hierarchical</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Tue, 07 Apr 2026 16:51:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!80Er!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>End-to-end Vision-Language-Action (VLA) models bundle perception, reasoning, and motor control into a single network, but that means the camera, kinematics, and training scenarios are all baked in together. This could cause <a href="https://www.avikde.me/debugging-as-architecture-insight">unexpected</a> and <a href="https://www.avikde.me/a-coding-agent-equivalent-for-robotics">unresolvable</a> issues when the task, embodiment, or environment change.</p><p>To showcase and demonstrate some of the insights from the past articles, I&#8217;ve put together a demonstration of the insights from this article series that you can try out, modify, and learn from. This demo combines the flexible task programming and reasoning of the Gemini ER Vision-Language-Model (what is the scene, and what should I do?) and classical camera calibration, kinematics, motion controllers.</p><p>This post describes how it is put together, goes over of some of its interesting capabilities, and the aspects of its design that directly impact its strengths and weaknesses. To conclude, we will try to compare this approach against fully modular (model-based) as well as fully end-to-end methods. The <a href="https://github.com/avikde/vla-pipeline">code is open source</a>, and I&#8217;m putting the ideas out there for discussion and feedback.</p><p><em>This article is the last part of a series on end-to-end robotics pipelines. Links to the other articles are below.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Trying out the demo</h2><p>To make it as accessible as possible, the demo runs in the browser with no software installation required, and can be accessed from your computer or even a phone. Click this button or to open the page:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://avikde.github.io/vla-pipeline/&quot;,&quot;text&quot;:&quot;Link to demo&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://avikde.github.io/vla-pipeline/"><span>Link to demo</span></a></p><p>The environment is set up for tabletop manipulation with a robot arm. The colored blocks are objects that we can instruct the arm to move, the &#8220;plates&#8221; can serve as potential goal locations, and the grey cylinders can serve as obstacles to be avoided.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!80Er!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!80Er!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 424w, https://substackcdn.com/image/fetch/$s_!80Er!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 848w, https://substackcdn.com/image/fetch/$s_!80Er!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 1272w, https://substackcdn.com/image/fetch/$s_!80Er!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!80Er!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png" width="698" height="365.6826003824092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2773110-8a13-44cc-a173-9181feb51737_1046x548.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:548,&quot;width&quot;:1046,&quot;resizeWidth&quot;:698,&quot;bytes&quot;:636439,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/193310864?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6b73ae2-6c31-4e28-b0e8-1495ef5c3817_1046x760.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!80Er!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 424w, https://substackcdn.com/image/fetch/$s_!80Er!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 848w, https://substackcdn.com/image/fetch/$s_!80Er!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 1272w, https://substackcdn.com/image/fetch/$s_!80Er!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2773110-8a13-44cc-a173-9181feb51737_1046x548.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">What the <a href="https://avikde.github.io/vla-pipeline/">demo</a> scene looks like</figcaption></figure></div><p>The demo uses a Gemini Robotics ER model for task reasoning and perception. To try it out, you need to grab your own <a href="https://ai.google.dev/gemini-api/docs/api-key">Gemini API key</a> (free tier), or use the pre-baked fallback plan, which will execute the &#8220;Put the blocks away where they belong&#8221; default task. Correspondingly, click &#8220;Run Task&#8221; (with API key) or &#8220;Use Cached Task&#8221; and watch! Use the mouse to orbit the camera, and check the console for debug logs.</p><h3>What it does well</h3><p><strong>Flexible task programming and reasoning. </strong>Tasks can be prompted without needing task-specific programming, which is a major selling-point: the possible tasks are not limited by what is programmed at the factory. Gemini processes the prompt together with the scene and can break down multi-step tasks. We&#8217;ll go over how Gemini&#8217;s outputs are used by the rest of the system below.</p><p>Results from some tasks:</p><blockquote><p>Place the red block on the blue target</p></blockquote><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;a5296793-58f7-4a44-9e61-c8dd633f87f2&quot;,&quot;duration&quot;:null}"></div><p>This simple task shows the VLM&#8217;s visual and task understanding. Additionally, its language understanding can parse semantically similar words in the context of the scene (e.g. block vs. cube, or plate vs. coaster vs. target).</p><p>The video also shows the <strong>reactive obstacle avoidance</strong> allowing the arm to not collide with the cylindrical obstacles. This capability, with associated safety benefits, does not require any training or motion primitives to be built into the VLM. More on that below.</p><div><hr></div><blockquote><p>&#8220;Put the blocks on matching targets&#8221;</p></blockquote><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;be487580-1c33-4c89-acc2-549f274f5546&quot;,&quot;duration&quot;:null}"></div><p>The VLM successfully reasons that blocks go on color-matched plates, and breaks down the task into a number of steps (move red block, move blue block).</p><div><hr></div><blockquote><p>&#8220;Swap the red and blue blocks&#8221;</p></blockquote><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;71d58308-02b2-4c31-a33a-935c39af17ae&quot;,&quot;duration&quot;:null}"></div><p>This task requires a multi-step plan to move one of the blocks out of the way first, and the selection of a free location to store it.</p><p>The wireframes displayed in the animation show the <strong>spatial understanding</strong> ability built from a combination of a Gemini <strong>VLM with classical computer vision</strong>. Objects in the scene are semantically classified &#8212; into objects (blue wireframes), potential goal locations (green), and potential obstacles (black) &#8212; by the VLM guided by prompting, without hardcoding.</p><p>As a note of caution, I had a few runs where it chose the &#8220;free&#8221; location incorrectly on top of another block.</p><div><hr></div><blockquote><p>&#8220;Put away the blocks&#8221;</p></blockquote><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;8efce8e3-b589-47eb-b3a6-391097177f42&quot;,&quot;duration&quot;:null}"></div><p>The success of this (underspecified) prompt showcases the language and intent understanding of the VLM. However, I will temper with the note that in some runs, it did try to move the green block and confuse itself &#8212; feel free to <a href="https://avikde.github.io/vla-pipeline/">try it yoursel</a>f!</p><div><hr></div><blockquote><p>&#8220;Wave&#8221;</p></blockquote><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;b8000122-8753-40b5-93c0-1138407cab5e&quot;,&quot;duration&quot;:null}"></div><p>This silly task shows that the VLM&#8217;s task understanding goes beyond tabletop manipulation, as it can produce waypoints just intended for arbitrary motion. However, this demo will only successfully perform horizontal motion due to the 2-dimensional understanding of the VLM &#8212; more on that below.</p><h3>What is challenging</h3><p>The principal weaknesses are also to do with the 2-dimensional understanding of the VLM.</p><blockquote><p>&#8220;Stack the blocks&#8221;</p></blockquote><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;2bb92616-d869-49b6-a656-2f4413d73998&quot;,&quot;duration&quot;:null}"></div><p>It correctly moves multiple blocks to the same horizontal position, but does not properly reason about the vertical location of each drop-off. This results in the later blocks getting smashed into the ones already placed.</p><h2>The architecture explains strengths and weaknesses</h2><p>The architecture of the <a href="https://avikde.github.io/vla-pipeline/">demo</a> is shown below:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iYxp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iYxp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 424w, https://substackcdn.com/image/fetch/$s_!iYxp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 848w, https://substackcdn.com/image/fetch/$s_!iYxp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 1272w, https://substackcdn.com/image/fetch/$s_!iYxp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iYxp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png" width="1382" height="292" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:292,&quot;width&quot;:1382,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:62158,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/193310864?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iYxp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 424w, https://substackcdn.com/image/fetch/$s_!iYxp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 848w, https://substackcdn.com/image/fetch/$s_!iYxp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 1272w, https://substackcdn.com/image/fetch/$s_!iYxp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2534de47-e804-4c83-9d84-51efa01d5293_1382x292.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Gemini (VLM) blocks are blue, and blocks built using classical methods are green. Each layer is independently swappable, and the AI model doesn&#8217;t need to know anything about the robot&#8217;s embodiment. This recreates the modularity of a <a href="https://www.avikde.me/the-architecture-behind-end-to-end">Sense-Plan-Act</a> architecture while retaining the semantic reasoning of a foundation AI model.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>Vision-Language Model (VLM)</h3><p>The demo uses <a href="https://ai.google.dev/gemini-api/docs/robotics-overview">Gemini ER</a>, whose inclusion I previously motivated <a href="https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics">with a coding agent analogy</a>. Its inputs are the text prompt and a single image, and its outputs are grounded in pixels in the same image. This keeps its behavior well-defined and decoupled from the robot embodiment, solving many of the issues with <a href="https://www.avikde.me/p/debugging-as-architecture-insight">X-VLA in a similar setup</a>. </p><p>However, it builds in a few assumptions that should be acknowledged. Most importantly, its understanding of the world is decidedly planar (pixels in the image plane).<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> The view must therefore be chosen to avoid occlusion, parallax-related issues as the camera moves, and tasks that require positioning along the camera axis (like the block-stacking task above).</p><p>Gemini ER can be prompted to output structured JSON, which is easy to work with in downstream layers. The system is first prompted for &#8220;perception&#8221;, which does object detection, semantic classification, and bounding box identification. All of these are very common functions, and easy for this model to complete in ~1 second. An example output for the perception block is below:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">[
   {
      "label":"green block", // &lt;- a name
      "point":[637, 232], // &lt;- position in image coordinates
      "box_2d":[531, 157, 743, 305], // &lt;- image coordinates of bounding box
      "type":"block" // &lt;- semantic classification
   }, // ... other detections 
]</code></pre></div><p>The next step asks the model to plan the motion for a task. We specify an output format that limits the output to &#8220;<a href="https://ai.google.dev/gemini-api/docs/function-calling">calling functions</a>&#8221; that the arm and its lower-level controller is capable of executing. Example output from Gemini (took anywhere from 4-10 seconds):</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">[
   {
      "function":"move", // a function that moves the arm
      "args":[584, 753, false], // position (image coords) + 1 bit indicating height
   },
   {
      "function":"setGripperState", // a function that closes or opens the gripper
      "args":[false] // false to close, true to open
   }, // ... other steps
]</code></pre></div><p>The full prompts to get these outputs are <a href="https://github.com/avikde/vla-pipeline/blob/main/web/gemini-er.js">part of the open-source package</a>.</p><h3>Spatial understanding</h3><p>First, we convert the image-plane understanding of the VLM into spatially accurate waypoints that the arm can act on. For this conversion, I also sampled depth values from the camera location (easily reproducible with stereoscopy or a model like <a href="https://arxiv.org/abs/2511.10647">DepthAnything</a>). I chose to use the bounding boxes to isolate the depth values in a region around the object center, and use that to fit primitive shapes to the detections (rendered with wireframes in the videos above). This can be done by well-understood camera geometry transformations, and also allows for relocation of the camera, <a href="https://www.avikde.me/p/debugging-as-architecture-insight">unlike in a VLA</a> where the camera geometry is inextricably linked into the rest of the model. The output of this block is 3D waypoints and a representation of the obstacles.</p><p>The object shape affects how well a bounding box captures inlying depth pixels. Gemini also has a native ability to output segmentation masks, which could allow for further refinement in this computational block.</p><h3>Model-based local planner</h3><p>The next part is a model-based local planner that actually generates control signals. This decouples the control rate from the slow runtime of the VLM completely, and no retraining is needed to generate novel motions for new scene compositions. <a href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation">Adaptations for payload</a> could be built into this layer without affecting VLM.</p><p>For obstacle avoidance, we use a &#8220;potential field&#8221; that pushes the end-effector away from obstacles (you can see orange arrows appearing briefly in the animations above), while moving toward the desired goal. This is a classic reactive <a href="https://modernrobotics.northwestern.edu/nu-gm-book-resource/10-1-overview-of-motion-planning/">motion planning technique</a>, one of a family of well-understood algorithms along with sampling-based and grid-search planners.</p><h2>The interface is crucial</h2><p>The VLA approach had no choices to be made about the type of interface &#8212; when trained on the same embodiment end-to-end, input pixels get mapped straight through to actions. However, with this hierarchical controller, the choice of interface is quite important. While it resolved many of the drawbacks of the full end-to-end approach that <a href="https://www.avikde.me/p/debugging-as-architecture-insight">held back a demonstration like this</a>, one of the interpretations of the <a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">bitter lesson</a> is that <em>any</em> hand-crafted interface design hampers system performance.</p><p>For example, for grasp generation in this demo, we have to assume that knowing the location of block is sufficient to produce an action to grasp it. However, different grasping actions may be needed for soft or unusually-shaped objects, like eggs, cloth, etc. One possible extension to resolve this is to incorporate a grasp generation module seeded by the object centers and bounding boxes. A VLA will just directly output actions, which is not limited by this kind of architectural judgment, but also may require a lot more training data and fail unpredictably when out of distribution.</p><h2>Scoring the criteria from the first article</h2><p>The architecture described above is neither an end-to-end VLA, nor a modular model-based one. For specificity in this section, I&#8217;ll assume the former camp as being represented by something like Physical Intelligence&#8217;s models (a small version of which we <a href="https://www.avikde.me/p/debugging-as-architecture-insight">tried hands-on with X-VLA</a>), and the latter as being represented by the MIT 2014 Atlas method. Both were discussed in the <a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">first part in this series about end-to-end robotics pipelines</a>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>All that said, where does this &#8220;hybrid&#8221; hierarchical strategy fall? We identified a number of criteria in previous articles, and can try to roughly size up where each falls:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/9cRzg/3/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9bf688c-9add-4d6a-ba7a-a65ef5b80d46_1220x1114.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a286a468-da03-436a-8e59-2923b0911406_1220x1114.png&quot;,&quot;height&quot;:687,&quot;title&quot;:&quot;Scoring end-to-end&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/9cRzg/3/" width="730" height="687" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>My summary would be that the end-to-end method can be the best <em>if it is scaled potentially ad infinitum and has very fast computational hardware</em>, which has practical (data requirements) and efficiency drawbacks. I think the hybrid architecture could be a good middle ground to greatly expand capabilities with less data and added safety and efficiency, but has some bottlenecks from interface choices that may impact some applications (but in a predictable way). I&#8217;m open to your thoughts &#8212; let me know below!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/building-a-reasoning-hierarchical/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/building-a-reasoning-hierarchical/comments"><span>Leave a comment</span></a></p><h2>Closing thoughts</h2><p>This demo was put together with models released within the last year, but also with ideas that have existed for decades. We&#8217;ve been seeing transformational improvement in the capabilities of deep neural networks, leading in many cases to large strategic shifts to embrace fully end-to-end architectures. However, this shift brings with it new problems in safety, efficiency, and predictability. This post goes over a proposal for a hybrid architecture that attempts to draw on the strengths of both camps.</p><p>There is room for improvement in end-to-end VLA approaches with scaling, as well as in this kind of hybrid architecture (faster VLM inference, multi-view VLM). <a href="https://itcanthink.substack.com/p/will-world-models-allow-robots-to">&#8220;World model&#8221; methods</a> are rapidly gaining popularity as a component of larger modular pipelines (stay tuned for future posts on this topic). I also plan to look more into how to build an &#8220;embodied reasoning&#8221; open-weight VLM in future posts.</p><p><em>Please check out the<strong> <a href="https://avikde.github.io/vla-pipeline/">demo</a></strong>, and the <strong><a href="https://github.com/avikde/vla-pipeline">source code</a></strong>.</em></p><p><em>If you liked this post, please <strong>like &#9825;</strong>, <strong>share</strong>, <strong>restack</strong>, and <strong>subscribe</strong> &#8212; it helps others find my writing.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h2>Further reading</h2><p>Other articles in this series:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;63dcb3ae-a9ad-4b7e-8d92-d3e18211ca4f&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The architecture behind &#8220;end-to-end&#8221; robotics pipelines&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-26T21:19:56.368Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!766O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/the-architecture-behind-end-to-end&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:185869291,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:23,&quot;comment_count&quot;:15,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;9b0c6a02-32e1-4b94-a6e1-e598a5cbfa76&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;\&quot;Is it learning?\&quot; Online motor adaptation in end-to-end robotics&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-03T17:51:24.836Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!x8Re!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/is-it-learning-online-motor-adaptation&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:186635241,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:5,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;ae055054-9c13-4b3a-9b55-871516d6b046&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Debugging as architecture insight: dissecting a VLA&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-26T15:46:18.127Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!zyjp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/debugging-as-architecture-insight&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:188827303,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:14,&quot;comment_count&quot;:2,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;8553c497-5203-479b-acb0-6b29e9923dd0&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;A coding agent equivalent for robotics pipelines&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-26T18:18:42.566Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!IGFD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee72e6-b019-4210-ab60-5d852f7b3f90_640x480.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:192049893,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:12,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I wonder if a VLM could be built with stereoscopic vision and some way to associate objects in the two images. Let me know in the comments if you know of anything like this!</p></div></div>]]></content:encoded></item><item><title><![CDATA[A coding agent equivalent for robotics pipelines]]></title><description><![CDATA[Part 4: Closing the action loop with a VLA vs. a spatial VLM "agent"]]></description><link>https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics</link><guid isPermaLink="false">https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Thu, 26 Mar 2026 18:18:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IGFD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee72e6-b019-4210-ab60-5d852f7b3f90_640x480.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This article is part of a series on end-to-end robotics pipelines</em></p><ol><li><p><a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">The architecture behind &#8220;end-to-end&#8221; robotics pipelines</a></p></li><li><p><a href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation?r=5vzx85">Online motor adaptation</a></p></li><li><p><a href="https://open.substack.com/pub/minpower/p/debugging-as-architecture-insight?utm_campaign=post-expanded-share&amp;utm_medium=web">VLA debugging insights</a></p></li><li><p>This article</p></li><li><p><a href="https://www.avikde.me/p/building-a-reasoning-hierarchical">Demo combining the best features of end-to-end and classical approaches</a></p></li></ol><div><hr></div><p>In this part, we finally close the loop to get our WidowX robot arm in the MuJoCo simulation to execute some manipulation tasks. I&#8217;ll go over how to build up (from scratch) something like the following behavior from a text prompt, and what we can learn about the architecture of robotics pipelines in the process.</p><p><em>Result of &#8220;Place the red block on the blue target&#8221;:</em></p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;b4a8f3ec-b69e-40e1-b2fe-fefb46fcd952&quot;,&quot;duration&quot;:null}"></div><p>An end-to-end Vision-Language-Action (VLA) model is the obvious modern technology <a href="https://open.substack.com/pub/itcanthink/p/vision-language-action-models-and?utm_campaign=post-expanded-share&amp;utm_medium=web">researchers and companies are moving toward</a> for this kind of functionality, and part 3 of this series was dedicated to understanding them from the inside. The deployment exercise for this part made clear that a small VLA&#8217;s failure modes are difficult&#8212;to the point of impossible&#8212;to eliminate without retraining.</p><p>That observation ultimately forced a pivot to a different architecture, where the flexible programming and semantic reasoning layer delegates physical grounding to explicitly separate tools. This post explains how that architecture works, and what it says about robotics pipelines more broadly.</p><p>In addition to the story, all the <a href="https://github.com/avikde/vla-pipeline">code is open-source</a> &#8212; feel free to learn from, star, and fork! Also, if you like this kind of post, please like, share, and subscribe:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>Moravec&#8217;s paradox and VLAs: control bandwidth problem</h3><p>In the previous post, we spent some time interpreting the perception and language understanding in VLAs &#8212; specifically, the <strong>V</strong>ision and <strong>L</strong>anguage parts. In many ways, the action head exhibits the most architectural diversity in VLAs.</p><p>There are broadly two types of action heads. (<strong>Auto)-regressive</strong> action heads generate actions sequentially, one at a time. This is similar to how most LLMs work today, generating tokens one after the other. <strong>Generative </strong>(or diffusion / flow-matching)<strong> </strong>action heads, in contrast, generate a whole action sequence at a time and incrementally refine it, similar to diffusion-based image generators.</p><p>Regressive action generators have a fundamental difficulty when used for behavior cloning in continuous action spaces. As Max Simchowitz presents in his recent CMU RI seminar<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, the issue is that a small deviation takes the red robot trajectory off the training (expert) demonstration distribution, and it is unable to recover.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EwG8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EwG8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 424w, https://substackcdn.com/image/fetch/$s_!EwG8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 848w, https://substackcdn.com/image/fetch/$s_!EwG8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 1272w, https://substackcdn.com/image/fetch/$s_!EwG8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EwG8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png" width="593" height="286.3179945054945" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:703,&quot;width&quot;:1456,&quot;resizeWidth&quot;:593,&quot;bytes&quot;:280583,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/192049893?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EwG8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 424w, https://substackcdn.com/image/fetch/$s_!EwG8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 848w, https://substackcdn.com/image/fetch/$s_!EwG8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 1272w, https://substackcdn.com/image/fetch/$s_!EwG8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe547b466-3fd7-4d5a-8ba2-2326997f1904_1488x718.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Compounding error with regressive actions (source: Simchowitz RI seminar)</figcaption></figure></div><p>The same problem doesn&#8217;t occur in discrete spaces (like text generation) because they can be trained with a 0-1 or cross-entropy loss function, encouraging very aggressive contraction to the training distribution. Simchowitz identifies this challenge in continuous spaces with Moravec&#8217;s paradox (why learning hasn&#8217;t been as effective in physical tasks as in symbolic tasks like language).</p><p>Action chunking<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> presents a way to get around this problem. By producing an action sequence, over which the natural dynamics of the system is assumed to prevent compounding error, the rate of divergence is kept under control:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mx3T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mx3T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 424w, https://substackcdn.com/image/fetch/$s_!mx3T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 848w, https://substackcdn.com/image/fetch/$s_!mx3T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 1272w, https://substackcdn.com/image/fetch/$s_!mx3T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mx3T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png" width="595" height="238.24519230769232" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:583,&quot;width&quot;:1456,&quot;resizeWidth&quot;:595,&quot;bytes&quot;:290357,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/192049893?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mx3T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 424w, https://substackcdn.com/image/fetch/$s_!mx3T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 848w, https://substackcdn.com/image/fetch/$s_!mx3T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 1272w, https://substackcdn.com/image/fetch/$s_!mx3T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bcaf7a9-58cb-49a0-a3bd-c1412e5cff2f_1478x592.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Managed error with action chunking (source: Simchowitz RI seminar)</figcaption></figure></div><p>A key assumption there was that the underlying system needs to have some strong stability properties. I won&#8217;t go into definitions here, but in practice, this means that VLA actions are almost exclusively desired positions (as opposed to velocity or torque). More generally, this means that the behavior is what is called &#8220;quasi-static&#8221;, i.e. the robot goes through a sequence of statically stable configurations. As an aside, this is also why VLA-implemented manipulation behaviors are slow and wouldn&#8217;t apply to dynamic behaviors like agile locomotion; quoting <a href="https://www.quantamagazine.org/why-do-humanoid-robots-still-struggle-with-the-small-stuff-20260313/">this Quanta magazine article</a>, &#8220;Atlas moves like molasses while grasping auto parts but glides like a gymnast when it&#8217;s not touching anything except the floor&#8221;.</p><p>So, action chunking is a way to address the <strong>control bandwidth problem</strong> from part 1 for regressive policies. Generative action heads don&#8217;t have the same inherent divergence issue, and it does seem like in practice most VLAs use that strategy &#8212; this also applies to our <a href="https://www.avikde.me/p/debugging-as-architecture-insight?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">demo setup with X-VLA</a> from part 3. They learn a distribution over actions, and at inference time, start with a pure &#8220;noise&#8221; action and iteratively denoise it. One thing to note is that the trajectory horizon <a href="https://generalrobots.substack.com/p/robotera-snatches-silver-in-sock/comment/227549490">does not impact how long the inference takes</a> (it is just the size of the action distribution learned during training). This means that shortening the action horizon size in order to get faster results isn&#8217;t an option like it typically is in model-predictive control.</p><p>Now that we understand VLA action heads a little better, let&#8217;s move on to closing the action loop.</p><h3>Closing the loop with X-VLA: generalization and separation problems</h3><p>The VLA outputs action chunks (a sequence of desired poses), and we now need to control the motors to reach them. The model for the WidowX arm in our simulation is set up for position control on the joints. This is in part due to how most people are using this arm (in some cases due to algorithmic constraints as mentioned above). For this article, I chose to keep that as is, and as a first pass, implement the most reasonable control method in this situation: inverse kinematics (IK). The <a href="https://github.com/avikde/vla-pipeline/blob/main/scripts/widowx_control.py">implementation</a> uses gradient descent to iteratively find the joint angles that reach a certain pose. This is a generalizable and quick method that will probably get replaced in the last part of the series by a non-IK solution.</p><p>After closing the control loop, prompting the VLA to &#8220;pick up the red block&#8221;, and running the simulation &#8212; well, it didn&#8217;t work. At this point, it was a little bit of the same challenge of &#8220;black box debugging&#8221; as in <a href="https://www.avikde.me/p/debugging-as-architecture-insight?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">part 3</a>, but now with more (literal) moving pieces. </p><p>It&#8217;s important to remember that X-VLA is a small VLA, and its generalization capabilities are limited by model size. As we saw in part 3, the model&#8217;s spatial reasoning (how far to reach, when to close) is tightly coupled to the training camera viewpoints. The camera intrinsic and extrinsic parameters are wrapped up in the full X-VLA policy and not separable<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>, and so I tried to modify the images received by the policy to try and match the training dataset.</p><p>I went into the <a href="https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot/tree/main/videos">BridgeData training dataset</a>, and found the most similar task in the training data, grabbed the training video, and tried to make my scene resemble it as closely as possible. To do this, I manually tuned the camera position, robot gripper initial pose and framing (camera extrinsics), image field of view (intrinsics), aspect ratio &#8220;squishing&#8221; to match training data, lighting / shadows, table appearance:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cf9E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cf9E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 424w, https://substackcdn.com/image/fetch/$s_!cf9E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 848w, https://substackcdn.com/image/fetch/$s_!cf9E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 1272w, https://substackcdn.com/image/fetch/$s_!cf9E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cf9E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png" width="522" height="281" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:281,&quot;width&quot;:522,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184644,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/192049893?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cf9E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 424w, https://substackcdn.com/image/fetch/$s_!cf9E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 848w, https://substackcdn.com/image/fetch/$s_!cf9E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 1272w, https://substackcdn.com/image/fetch/$s_!cf9E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F278e344b-cffe-4e81-bd91-5fbe5b1c0c33_522x281.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Unfortunately, despite the manual tuning, and also completely decluttering the scene, the policy didn&#8217;t succeed with the prompt <em>&#8220;Pick up the red block&#8221;</em>:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;7eb1747d-d2b1-4ffa-9c4b-956d3d65c9c5&quot;,&quot;duration&quot;:null}"></div><p>It consistently overshot the block, which indicated to me that the visual processing had a consistent error, but fiddling with the camera settings didn&#8217;t yield a better result. The structural issue with VLAs (non-separability of camera and kinematics parameters) makes this debugging quite challenging, even beyond the techniques from part 3. If you know of anything that could have gotten this to work, let me know in the comments!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics/comments"><span>Leave a comment</span></a></p><p></p><p>I suspect that the generalization abilities of this size of VLA are just not sufficient to be able to use the policy zero-shot. There are two reasons why that is a roadblock: First (isolated to my usage here), I didn&#8217;t have a leader-follower arm or space mouse to collect more training data and go through a fine-tuning process. The second (and more fundamental) issue is that this limits how this kind of strategy can be used by robot end-users in ad-hoc unknown environments.</p><p>The flexible task programming and semantic task understanding of VLAs were some of the motivations for this project. Is there an alternative solution that can keep those strengths while adding some needed structure?</p><h3>An &#8220;agentic&#8221; modular alternative</h3><p>For scene and task understanding combined with flexible programming, we need some kind of VLM, but is there a way to get information out of the VLM in a more structured way?</p><p>In late 2025, Google released <a href="https://ai.google.dev/gemini-api/docs/robotics-overview">Gemini Robotics 1.5</a>, which consists of two models designed to have a hierarchical interface:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n4LZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n4LZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 424w, https://substackcdn.com/image/fetch/$s_!n4LZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 848w, https://substackcdn.com/image/fetch/$s_!n4LZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!n4LZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n4LZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png" width="466" height="401.15210355987057" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1064,&quot;width&quot;:1236,&quot;resizeWidth&quot;:466,&quot;bytes&quot;:267144,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/192049893?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F758b443d-1be0-4826-b046-0200c3f2b6fd_1236x1064.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n4LZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 424w, https://substackcdn.com/image/fetch/$s_!n4LZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 848w, https://substackcdn.com/image/fetch/$s_!n4LZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!n4LZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcee23d6d-6e17-418d-b82a-d2cd9cbc9ff6_1236x1064.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Out of the two, I only used the ER (Embodied Reasoning) model, which has been trained to output structured text combining the spatial understanding and function calling capabilities of the impressive Gemini model family. As <a href="https://ai.google.dev/gemini-api/docs/robotics-overview">documented here</a>, the &#8220;pointing&#8221; feature is effectively a customizable vision processing pipeline, and I found it to be incredibly robust:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RARj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RARj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 424w, https://substackcdn.com/image/fetch/$s_!RARj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 848w, https://substackcdn.com/image/fetch/$s_!RARj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 1272w, https://substackcdn.com/image/fetch/$s_!RARj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RARj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png" width="474" height="248.109375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:335,&quot;width&quot;:640,&quot;resizeWidth&quot;:474,&quot;bytes&quot;:180564,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/192049893?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee72e6-b019-4210-ab60-5d852f7b3f90_640x480.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RARj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 424w, https://substackcdn.com/image/fetch/$s_!RARj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 848w, https://substackcdn.com/image/fetch/$s_!RARj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 1272w, https://substackcdn.com/image/fetch/$s_!RARj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0aa3b7c3-36ec-4a7b-b979-dd2e2a735abd_640x335.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gemini Robotics ER 1.5 &#8220;pointing&#8221; capability, when just presented this image and asked to point out up to 10 objects in the scene.</figcaption></figure></div><p>The function calling capabilities can also be used to break down complex tasks into sub-steps, which is what I used for the working demo in the first section of this article. Here you can see that it is flexible to different prompts with no other changes:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;fe3f1923-93f1-4529-a960-00bf9106a41a&quot;,&quot;duration&quot;:null}"></div><p>Most shockingly, I spent only a couple of hours with the Gemini models to get to the successful end result above, after unsuccessful attempts over a significantly longer period with X-VLA.</p><p>So, why does this work so much more easily?</p><p>Just like Simchowitz did in his RI seminar, I think I&#8217;d have a pragmatic answer to do with scale, as well as an algorithmic answer independent of it. On model size, Gemini ER 1.5 is described as achieving &#8220;the low latency of a Gemini Flash model&#8221; for spatial tasks, which suggests it's Flash-scale (~8B range) but much larger than X-VLA (0.9B). On the algorithmic side, the difficulties we ran into with the VLA often had to do with <strong>inseparability of concerns </strong>(kinematics, calibration parameters not separable from the policy), and <strong>generalizability </strong>(difficult to tell when we were out of distribution).</p><p>I think an appropriate analogy here is between an LLM (even a coding-tuned one) to a coding <em>agent</em> like Claude Code (an LLM in a larger system that can interact with &#8220;tools&#8221;). A coding agent doesn&#8217;t ask the LLM to <a href="https://open.substack.com/pub/engrlog/p/why-skip-the-code-ship-the-binary?utm_campaign=post-expanded-share&amp;utm_medium=web">emit machine code directly</a>; it keeps the model in the semantic reasoning layer and delegates execution to existing well-understood tools. In this analogy, I&#8217;m suggesting that camera calibration, kinematics, motion controllers are tools that the VLM can benefit from interfacing with. Gemini ER just works on images; a well-defined, separable concern without introducing variability due to the robot morphology. Our known camera transformations then lift its image-space outputs into 3D. If we move the camera (impossible with X-VLA without retraining), we can simply replace the camera calibration parameters.</p><p>However, this structural separation appears to contradict the pure end-to-end view that goes back to the &#8220;bitter lesson.&#8221; Overall, in my opinion, the bitter lesson essay has been <a href="https://open.substack.com/pub/minpower/p/the-ai-world-models-debate-and-its?utm_campaign=post-expanded-share&amp;utm_medium=web">interpreted more broadly than current evidence supports</a>, and we will continue to see <a href="https://open.substack.com/pub/robonaissance/p/language-is-poison-part-2-the-bitter?utm_campaign=post-expanded-share&amp;utm_medium=web">reinterpretations</a> and corrections.</p><h3>Closing thoughts</h3><p>In this part of our series on robotics pipelines, we demonstrated a simple setup that exhibits flexible task programming. Despite our best efforts with an end-to-end VLA, this success came from coupling a strong VLM with model-based &#8220;tools&#8221; such as camera geometry and inverse / forward kinematics. This seems to me to reflect some of the strengths of agents that interacts with tools vs. an equivalent chatbot-style LLM. It certainly provided a clean way to integrate the strengths of a large learning-based model with structured model-based methods &#8212; something I&#8217;d set as a goal in part 1 of this series.</p><p>While this is a nice result, there are still a number of limitations: Gemini&#8217;s task planning is slow, even with cloud hardware. In the current implementation, the full plan is created at startup and there is no replanning for dynamic environments. The model is also not &#8220;open&#8221; and likely an order of magnitude larger than X-VLA. In the future, I may look into what it takes to develop an &#8220;embodied reasoning&#8221; model &#8212; it seems like the Gemini ER model appears to build on the ideas of the published <a href="https://arxiv.org/abs/2401.12168">SpatialVLM</a>.</p><p>In the last part of this series, I will plan to improve the lower-level controller from its naive IK implementation to show more responsive and <a href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">adaptive</a> behavior. I will also aim to publish it in a browser-runnable format so you can very easily and rapidly see the effects of different prompts. As a reminder, the code is all <a href="https://github.com/avikde/vla-pipeline">open-source</a>.</p><p><em>If you liked this post, please like (&#9825;), share, restack, and subscribe &#8212; it helps others find my writing.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><a href="https://youtu.be/UX1YXcRnFbs?si=wWY1LMwwtseW79Ku">Simchowitz RI seminar</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><a href="https://arxiv.org/abs/2304.13705">ACT paper</a>, whose author is a founder of Sunday Robotics, who in turn have an <a href="https://www.sunday.ai/journal/no-robot-data">ACT-1 foundation model</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>X-VLA has a soft-prompt architecture where embodiment specific parameters are technically separated, but not in an interpretable form.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Lessons from AVs on safety in end-to-end pipelines]]></title><description><![CDATA[Recent developments in autonomous vehicles on recognizing and handling distribution shift]]></description><link>https://www.avikde.me/p/lessons-from-avs-on-safety-in-end</link><guid isPermaLink="false">https://www.avikde.me/p/lessons-from-avs-on-safety-in-end</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Fri, 20 Mar 2026 18:48:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8Xju!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This short post covers a couple of recent updates from the autonomous vehicle (AV) industry with connections to broader and more general safety in robotics.</p><h3>Recognizing performance deterioration</h3><p>This <a href="https://www.theverge.com/transportation/897303/tesla-full-self-driving-nhtsa-probe-march-2026">Verge article from March 19</a> reports that there could be an impending recall of Tesla&#8217;s Full-Self Driving (FSD) service. I&#8217;m not interested in making any judgments about self-driving capability, but rather whether the root cause has anything we can learn from in broader robotics.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KoIT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KoIT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 424w, https://substackcdn.com/image/fetch/$s_!KoIT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 848w, https://substackcdn.com/image/fetch/$s_!KoIT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 1272w, https://substackcdn.com/image/fetch/$s_!KoIT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KoIT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png" width="537" height="214.62760834670948" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:498,&quot;width&quot;:1246,&quot;resizeWidth&quot;:537,&quot;bytes&quot;:124713,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/191604982?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KoIT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 424w, https://substackcdn.com/image/fetch/$s_!KoIT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 848w, https://substackcdn.com/image/fetch/$s_!KoIT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 1272w, https://substackcdn.com/image/fetch/$s_!KoIT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e8c97af-1c46-4e36-b5b0-17f23172671c_1246x498.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a><figcaption class="image-caption">Source: The Verge article linked above. Emphasis mine.</figcaption></figure></div><p>The issue appears to be that the system <strong>didn&#8217;t know when it wasn&#8217;t working well </strong>(causing the issues in the NHTSA filing), or that it did and didn&#8217;t notify the driver (which is unlikely, so we&#8217;ll assume the former).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8Xju!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8Xju!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8Xju!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8Xju!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8Xju!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8Xju!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg" width="583" height="293.10164835164835" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:732,&quot;width&quot;:1456,&quot;resizeWidth&quot;:583,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Tesla Full Self-Driving Beta 10.69 barrier&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tesla Full Self-Driving Beta 10.69 barrier" title="Tesla Full Self-Driving Beta 10.69 barrier" srcset="https://substackcdn.com/image/fetch/$s_!8Xju!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8Xju!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8Xju!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8Xju!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a8506d-3fba-41db-bafc-008bc52758a9_1600x804.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Tesla FSD (<a href="https://electrek.co/2026/03/19/nhtsa-upgrades-tesla-fsd-visibility-investigation-3-2-million-vehicles/">source</a>)</figcaption></figure></div><p>This phenomenon isn&#8217;t isolated to AVs. The latest article in my Vision-Language-Action (VLA) robotics pipeline series went hands-on into <a href="https://www.avikde.me/i/188827303/vla-debugging-ideas-and-techniques">debugging one</a>, and while we found some techniques that can aid developers, they didn&#8217;t directly help at inference time. Item 1 in <a href="https://ruixu.us/posts/six-things-robotics-startup">Rui Xu&#8217;s candid post-mortem</a> of K-Scale Labs mentions the pitfalls of trusting a &#8220;large model&#8221; vs. dedicated safety features. Recent papers on VLAs mention the fragility when moving away from the training distribution (e.g. <a href="https://arxiv.org/html/2506.09930v1">Fang et al Jun 2025</a>, <a href="https://arxiv.org/html/2512.16760v2">Hu et al Jan 2026</a>).</p><h3>Potential solutions: redundancy, confidence, architecture</h3><p>NVIDIA recently announced their new Alpamayo model and accompanying AV stack as a reference open model and toolchain. During the CES 2026 keynote, Jensen Huang said something intriguing about safety:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MHGN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MHGN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 424w, https://substackcdn.com/image/fetch/$s_!MHGN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 848w, https://substackcdn.com/image/fetch/$s_!MHGN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 1272w, https://substackcdn.com/image/fetch/$s_!MHGN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MHGN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png" width="570" height="302.22527472527474" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:772,&quot;width&quot;:1456,&quot;resizeWidth&quot;:570,&quot;bytes&quot;:203362,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/191604982?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MHGN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 424w, https://substackcdn.com/image/fetch/$s_!MHGN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 848w, https://substackcdn.com/image/fetch/$s_!MHGN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 1272w, https://substackcdn.com/image/fetch/$s_!MHGN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e20eb32-d24e-4ebc-89e5-6ca44eceb0a4_1464x776.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://junkoyoshidaparis.substack.com/p/nvidia-pulling-an-elon-might-have">Junko&#8217;s Tech Probe article</a></figcaption></figure></div><p>This parallel or hybrid architecture with a classical stack and a policy arbitrator were also covered in this <a href="https://counterpointresearch.com/en/insights/counterpoint-conversations-nvidia-at-ces-from-full-stack-autonomy-to-an-open-ecosystem-play">CounterPoint research article</a>. Interestingly, I can&#8217;t find references from NVIDIA themselves about this parallel system other than Jensen&#8217;s keynote &#8212; it&#8217;s possible it is just early in development.</p><p>A related approach is to have the VLA output some kind of confidence (vs. a separate &#8220;policy arbitrator&#8221;). <a href="https://arxiv.org/pdf/2507.17383">Zollo et al (Dec 2025)</a> formalizes the problem of confidence calibration for VLA policies, describes how to extract confidence estimates from contemporary VLA architectures, and notes that current VLAs lack a reliable mechanism for quantifying the uncertainty of their chosen action sequences. It also introduces two potential remedies: prompt ensembles and action-wise Platt scaling.</p><p>Lastly, inserting some debuggable interfaces into end-to-end pipelines can facilitate inspection and safety &#8212; lower-level controllers can apply dedicated safety constraints based on the information passed down from a higher-level controller. This <a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">appears to still be possible</a> in most successful humanoid robotics demonstrations of today due to a combination of factors. Keeping that architectural feature around may have long-standing benefits, based on current events in the AV industry!</p><p></p><p>Thanks for reading! I have been working on the next part of the <a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">end-to-end pipeline series</a>, with a deep dive into the action head and closed-loop behavior. If you liked this post, please share and subscribe.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Systolic arrays for general robotics, AI, and scientific computing]]></title><description><![CDATA[MatMuls dominate today's accelerators, but the original vision was much broader]]></description><link>https://www.avikde.me/p/systolic-arrays-for-general-robotics</link><guid isPermaLink="false">https://www.avikde.me/p/systolic-arrays-for-general-robotics</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Thu, 12 Mar 2026 15:09:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YIQz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The TPU (Tensor Processing Unit), introduced by Google in a whirlwind project ~2015, has now become synonymous with hardware acceleration for deep neural networks. I&#8217;ve listed some references below on further reading on the TPU (I&#8217;d especially recommend <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Babbage&quot;,&quot;id&quot;:102722254,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F82525b9c-ee3c-4996-916c-54267a4d354b_416x416.png&quot;,&quot;uuid&quot;:&quot;8da5c836-c587-4146-bce3-64a9c55735ee&quot;}" data-component-name="MentionToDOM"></span>&#8217;s <a href="https://thechipletter.substack.com/p/googles-first-tpu-architecture">historically-situated introduction</a>), but at the core of the TPU is a matrix multiplication unit (MXU) that achieves high-throughput and highly-efficient matrix multiplication. Since then, the concept has been integrated into a huge variety of hardware accelerators for neural networks (Groq LPU, NVIDIA Tensor Cores, Apple Neural Engine, Qualcomm Hexagon, and most NPUs), so you may think that it was Google&#8217;s ML inference ambitions that started this <a href="https://thechipletter.substack.com/p/ai-accelerators-the-cambrian-explosion">cambrian explosion</a> in matrix multiplication acceleration &#8212; but that would be almost 40 years off the mark.</p><p>All these matrix multiplication units are based on the systolic array, an architectural concept invented by HT Kung at Carnegie Mellon University in the late <em>1970&#8217;s</em>. And Kung&#8217;s group didn&#8217;t stop at matrix multiplication, they presented a concept of systolic <em>networks</em> of arbitrary processing <em>nodes</em> that could do way more. While some of those concepts appear in niche signal-processing ASICs today, the dominance of deep neural networks over the last decade has caused this history and potential to be significantly overlooked in my opinion.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bf6n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bf6n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Bf6n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Bf6n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Bf6n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bf6n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg" width="267" height="267" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:267,&quot;width&quot;:267,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18185,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bf6n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Bf6n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Bf6n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Bf6n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca0876d-d71e-47d0-8e37-2550cd332955_267x267.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://seas.harvard.edu/person/ht-kung">HT Kung</a></figcaption></figure></div><p>My interest in this (and the goal of this article) is twofold: (1) Shine a spotlight on this fascinating research and preview the types of problems that can be solved with systolic architectures. (2) Dig into and potentially uncover jumps in performance and efficiency for AI and robotics. I believe that holistic full-stack understanding and optimization (bringing together algorithms and hardware) will be key in advancing  these technologies.</p><p>Beyond this post, we won&#8217;t stop at a theoretical overview &#8212; leveraging the computer engineering experience and story-telling of <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Chip Insights&quot;,&quot;id&quot;:2850528,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/chipinsights&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74222e4c-9d04-46aa-82ba-7d82759b48b9_512x512.png&quot;,&quot;uuid&quot;:&quot;59de6f17-98bb-4b82-8beb-9b1104da007d&quot;}" data-component-name="MentionToDOM"></span> we will actually build up accelerators to use in general-purpose robotics, AI, and scientific applications. We have an article coming soon with the first step, so make sure to subscribe!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h3>Why systolic architectures</h3><p>A systolic architecture is characterized by a network of processing elements (PE) that feed data to each other instead of going to the memory hierarchy for operands.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YIQz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YIQz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 424w, https://substackcdn.com/image/fetch/$s_!YIQz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 848w, https://substackcdn.com/image/fetch/$s_!YIQz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 1272w, https://substackcdn.com/image/fetch/$s_!YIQz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YIQz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png" width="1138" height="596" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:596,&quot;width&quot;:1138,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:81201,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/190643644?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c468baf-56ba-42f6-96ee-4b5d88455188_1138x668.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YIQz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 424w, https://substackcdn.com/image/fetch/$s_!YIQz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 848w, https://substackcdn.com/image/fetch/$s_!YIQz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 1272w, https://substackcdn.com/image/fetch/$s_!YIQz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F932fc917-c77d-4254-b4b4-29c99149e1b5_1138x596.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Systolic array concept from Kung (1982)</figcaption></figure></div><p></p><p>The core benefits are:</p><ul><li><p>It alleviates <strong>memory bottlenecks</strong> by allowing multiple compute operations to occur without going to memory (as nicely depicted by the figure above). The design can allow computation time to be balanced with I/O if designed properly, avoiding one stalling due to the other.</p></li><li><p>It can create <strong>simple, regular designs</strong> &#8594; a modular setup that can be extended for different functions. It is relatively easy to write the RTL!</p></li><li><p>2D arrays can very easily be <strong>deeply pipelined</strong> (as we will see below), naturally taking advantage of algorithm concurrency.</p></li></ul><p>The PE network can look like a 1D array (pictured above), 2D array (the most common today), or even other connections for specialized computations. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2-0_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2-0_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 424w, https://substackcdn.com/image/fetch/$s_!2-0_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 848w, https://substackcdn.com/image/fetch/$s_!2-0_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 1272w, https://substackcdn.com/image/fetch/$s_!2-0_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2-0_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png" width="716" height="186" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:186,&quot;width&quot;:716,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2-0_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 424w, https://substackcdn.com/image/fetch/$s_!2-0_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 848w, https://substackcdn.com/image/fetch/$s_!2-0_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 1272w, https://substackcdn.com/image/fetch/$s_!2-0_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04a2f9db-3619-43fa-a752-6a6278ab3ab9_716x186.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Another figure from Kung (1982) &#8212; connections depend on the number of inputs and outputs for each PE.</figcaption></figure></div><p>Data flows between cells in a pipelined fashion, and communication with the outside world is at boundary cells.</p><h3>The foundation of TPU &#8212; a MAC systolic network</h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R98b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R98b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 424w, https://substackcdn.com/image/fetch/$s_!R98b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 848w, https://substackcdn.com/image/fetch/$s_!R98b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 1272w, https://substackcdn.com/image/fetch/$s_!R98b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R98b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png" width="200" height="240" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:240,&quot;width&quot;:200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7317,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/190643644?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!R98b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 424w, https://substackcdn.com/image/fetch/$s_!R98b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 848w, https://substackcdn.com/image/fetch/$s_!R98b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 1272w, https://substackcdn.com/image/fetch/$s_!R98b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feea556ec-2fc8-45c1-bcce-b7ebf6a543b2_200x240.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>A <strong>multiply-accumulate (MAC)</strong> PE has two input edges and two output edges. In the form drawn below (&#8220;weight-stationary&#8221;), the weight <em>w </em>is a parameter loaded into the PE. The data<em> x</em> flows in and is passed unchanged left to right, and the current &#8220;accumulation&#8221; <em>b</em> flows in from the top (usually from a PE connected to the north). The PE does the multiply-accumulate (<em>x * w + b</em>) and passes the accumulated sum down. We assume that the calculation happens in a single &#8220;tick&#8221; or clock cycle.</p><p>A PE in a systolic network is typically a simple compute primitive. Its power comes from connections to other PEs to express complex calculations.</p><p>The easiest way to understand how a weight-stationary systolic <em>array</em> works is to understand how a <strong>dot product</strong> is computed. This is shown in the following image for 3 cycles, and we will walk through the computation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MCl3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MCl3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 424w, https://substackcdn.com/image/fetch/$s_!MCl3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 848w, https://substackcdn.com/image/fetch/$s_!MCl3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 1272w, https://substackcdn.com/image/fetch/$s_!MCl3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MCl3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp" width="716" height="638" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:638,&quot;width&quot;:716,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13260,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/190643644?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MCl3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 424w, https://substackcdn.com/image/fetch/$s_!MCl3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 848w, https://substackcdn.com/image/fetch/$s_!MCl3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 1272w, https://substackcdn.com/image/fetch/$s_!MCl3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85f3020-8db6-4b67-8b17-ccc2f728ac47_716x638.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Read this image column-wise, starting from the left.</figcaption></figure></div><p>In each cycle, a new entry of <em>x</em> appears from the left, and one term is added to the dot product. The column of PEs contains a vector of weights. In each cycle, one term of the dot product is accumulated, and after 3 cycles, we have accumulated the full dot product <em><strong>b + w&#183;x</strong>.</em></p><p>We now draw the exact same operation, but in an abridged form (not showing the intermediate calculations and instead just showing the inputs and outputs at the ticks they appear).</p><ul><li><p>A column of the array is drawn as <em>vector</em> weight <em>w<sub>i</sub></em></p></li><li><p>The inputs are drawn as a diagonal (and enters the array skewed in time)</p></li><li><p>The output is shown at the bottom, appearing after <em>R</em> cycles from when the input hits row 1, where <em>R</em> is the number of PE&#8217;s in the column</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UyAL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UyAL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 424w, https://substackcdn.com/image/fetch/$s_!UyAL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 848w, https://substackcdn.com/image/fetch/$s_!UyAL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 1272w, https://substackcdn.com/image/fetch/$s_!UyAL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UyAL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp" width="942" height="494" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:494,&quot;width&quot;:942,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15508,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/190643644?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UyAL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 424w, https://substackcdn.com/image/fetch/$s_!UyAL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 848w, https://substackcdn.com/image/fetch/$s_!UyAL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 1272w, https://substackcdn.com/image/fetch/$s_!UyAL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93d60b2-dbc1-4100-92e6-1d8001182424_942x494.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In this form it is easy to see that we are <strong>pipelining</strong> different <em>x<sub>i</sub></em> by starting the second one the cycle after the first one. The initial accumulator value can be set to <em>b<sub>i</sub></em>, an affine bias.</p><p>So, with a single column systolic array, holding a column vector <em>w</em>, we are computing <em><strong>y = b + X&#183;w</strong></em>, where the rows of <em>X</em> are <em>x<sub>1</sub></em>, <em>x<sub>2</sub></em>, &#8230;</p><p>It is also noting the latency between when the <em>x<sub>i</sub></em> starts getting input to when we receive the output: The first element of <em>x<sub>1</sub></em> enters the array at time <em>t=1</em>, and we get the result out at <em>t=R</em>, so the latency is <em>R-1</em> cycles.</p><p>Making this a <strong>2D array</strong> (recall that the input x&#8217;s are bypassed to the right from each PE), we see that <em>x<sub>i</sub></em> will just arrive to interact with <em>w<sub>2</sub></em> one cycle later. We can appropriately skew the columns of the B matrix:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X1yw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X1yw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 424w, https://substackcdn.com/image/fetch/$s_!X1yw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 848w, https://substackcdn.com/image/fetch/$s_!X1yw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 1272w, https://substackcdn.com/image/fetch/$s_!X1yw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X1yw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp" width="960" height="760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:760,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18578,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/190643644?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!X1yw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 424w, https://substackcdn.com/image/fetch/$s_!X1yw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 848w, https://substackcdn.com/image/fetch/$s_!X1yw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 1272w, https://substackcdn.com/image/fetch/$s_!X1yw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F635ea2c7-b84a-4cd5-af8a-ce5e97983ad5_960x760.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The operation that is executes is <em><strong>Y = B + X&#183;W</strong></em>, where <em>b<sub>ij</sub> </em>above is in row <em>i</em> and column <em>j</em> of <em>B</em>, <em>W = [w1, w2]</em> is the fixed weight matrix loaded in first. If <em>W</em> is <em>n&#215;n</em>, and <em>X</em> is <em>m&#215;n</em>, the matrix product is <em>O(mn<sup>2</sup>)</em> operations (as is standard), but due to the <strong>structurally-enforced pipelining</strong>, it was completed in <em>O(m+n)</em> cycles!</p><p>And just like that, with a very simple MAC-computing PE, we can build up the matrix multiplication hardware unit that is the core of most AI hardware accelerators.</p><p>There is much more to be said about how it is implemented in RTL, how it performs, how the matrix shapes affect utilization, the total latency, throughput and efficiency benefits. We will go over that and intuitive insights in an upcoming <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Chip Insights&quot;,&quot;id&quot;:2850528,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/chipinsights&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74222e4c-9d04-46aa-82ba-7d82759b48b9_512x512.png&quot;,&quot;uuid&quot;:&quot;56f3a94e-1fd9-493e-983f-7beedc9b2d68&quot;}" data-component-name="MentionToDOM"></span> post. In the remainder of this article, we will turn our attention to other, more overlooked, uses of systolic networks in applications to broader AI, robotics, and numerical methods.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>Moving beyond MAC</h3><p>Keeping the exact same 2D array structure and the skewed input feeding, observe that we had two underlying binary operations: the PE (node) computed <strong>multiply (</strong><em><strong>&#183;</strong></em><strong>)</strong> and the arrow (edge) computed <strong>sum (+)</strong>. In general, the array will compute the result with those operators replaced by any counterparts: <strong>(</strong><em><strong>x<sub>1</sub></strong></em><strong>&#8857;</strong><em><strong>w<sub>1</sub></strong></em><strong>) &#8853; (</strong><em><strong>x<sub>2</sub></strong></em><strong>&#8857;</strong><em><strong>w<sub>2</sub></strong></em><strong>) &#8853; &#8943;</strong></p><p>I&#8217;ll be brief with these and list further reading below, and try to draw special attention to ones that are interesting for applications in robotics and AI.</p><h4>1) Pattern matching (Kung group)</h4><p>Using logical and (&#8743;) and logical or (&#8744;) as the operations: <em><strong>y</strong></em><strong> = (</strong><em><strong>x<sub>1</sub></strong></em><strong>&#8743;</strong><em><strong>w<sub>1</sub></strong></em><strong>) &#8744; (</strong><em><strong>x<sub>2</sub></strong></em><strong>&#8743;</strong><em><strong>w<sub>2</sub></strong></em><strong>) &#8744; &#8943;</strong></p><p>This will return <strong>1</strong> if the vector <em>x</em> matches the vector <em>w</em>, and <strong>0</strong> otherwise.</p><h4>2) Sorting (Kung group)</h4><p>This one is fascinating and intuitive. Each PE performs a simple compare and swap operation, and passes the max downward and the min rightward. With <em>n</em> rows and <em>n</em> columns, it will execute the <a href="https://en.wikipedia.org/wiki/Odd%E2%80%93even_sort">odd-even sort</a> algorithm and produce the sorted array.</p><p>A glance at that wikipedia page reveals both a weakness and a strength of systolic arrays. They can only execute algorithms that can work based on local connections (the odd-even sort takes <em>O(n<sup>2</sup>)</em> operations, vs. more optimal algorithms), but as in matrix multiplication above, the latency is <em>O(n)</em>. While the best sort algorithm takes <em>O(n log n)</em> steps sequentially in scalar hardware, the systolic network lets suboptimal algorithms complete with lower latency.</p><h4>3) 2D motion planning</h4><p>Deterministic motion planning (identifying environmental obstacles and planning a path through free areas respecting the system dynamics) is a fundamental problem in robotics. About 10 years ago there was even attempt to <a href="https://spectrum.ieee.org/motionplanning-chip-speeds-robots">build chips to solve this problem</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kdee!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kdee!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 424w, https://substackcdn.com/image/fetch/$s_!kdee!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 848w, https://substackcdn.com/image/fetch/$s_!kdee!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 1272w, https://substackcdn.com/image/fetch/$s_!kdee!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kdee!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png" width="444" height="407.64912280701753" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:684,&quot;resizeWidth&quot;:444,&quot;bytes&quot;:446396,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/190643644?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kdee!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 424w, https://substackcdn.com/image/fetch/$s_!kdee!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 848w, https://substackcdn.com/image/fetch/$s_!kdee!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 1272w, https://substackcdn.com/image/fetch/$s_!kdee!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa234c2f2-602f-4c18-b417-0d6a2b1390f9_684x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Grid-based motion planning using dynamic programming (<a href="https://modernrobotics.northwestern.edu/nu-gm-book-resource/10-4-grid-methods-for-motion-planning/">source</a>)</figcaption></figure></div><p>Dynamic programming solutions (including Dijkstra&#8217;s algorithm, A*) can be implemented by local and iterative propagation from the goal, and just as with odd-even sort, the nearest-neighbor connection pattern can be mapped well to a systolic array.</p><p>Unfortunately, the number of grid cells grows exponentially with the dimension of the ambient space, and this is problematic if we need to have one PE per cell. This makes systolic motion planning impractical unless we only have a 2D problem to solve, but I think it is an interesting application nonetheless.</p><h4>4) Stereo vision semi-global matching</h4><p>A PE that accumulates matching costs along a scanline can be used to form a systolic array that implements <a href="https://en.wikipedia.org/wiki/Semi-global_matching">semi-global matching</a> (SGM). This algorithm is used to calculate disparity in the very popular <a href="https://github.com/realsenseai/librealsense/discussions/11586">Intel RealSense camera ASICs</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YcYE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YcYE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 424w, https://substackcdn.com/image/fetch/$s_!YcYE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 848w, https://substackcdn.com/image/fetch/$s_!YcYE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 1272w, https://substackcdn.com/image/fetch/$s_!YcYE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YcYE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png" width="250" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46126588-8883-46e7-8186-8ba28ce42e09_250x250.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:250,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YcYE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 424w, https://substackcdn.com/image/fetch/$s_!YcYE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 848w, https://substackcdn.com/image/fetch/$s_!YcYE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 1272w, https://substackcdn.com/image/fetch/$s_!YcYE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46126588-8883-46e7-8186-8ba28ce42e09_250x250.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Grid-based SGM depiction (<a href="https://en.wikipedia.org/wiki/Semi-global_matching">source</a>)</figcaption></figure></div><p>SGM systolic arrays on FPGAs run this at pixel rate, processing one scanline per clock, and deterministic low-latency computation is obviously paramount here.</p><h4>5) Matrix decompositions for numerical methods</h4><p>To some extent, I&#8217;ve saved the most promising (at least in my view) for last. Matrix decompositions that aid in factorization are key to solving systems of equations, and this is ubiquitous in all sorts of robotics and general problems.</p><p><strong>5.1) QR decomposition. </strong>This matrix factorization is the numerically stable way to solve <strong>least squares or pseudoinverses</strong> in overdetermined systems, and has applications to robot kinematics, SLAM, sensor fusion, online parameter estimation, etc. Additionally, it is a key component of <strong>quadratic program (QP) solvers</strong>: in active set solvers after the active set is identified, for Jacobian factorization in SQP, etc. These workloads are important in <a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">low-level control in robotics</a> and typically need deterministic and low-latency solutions. The Givens rotations method (gentle explainer <a href="https://kwokanthony.medium.com/detailed-explanation-with-example-on-qr-decomposition-by-givens-rotation-6e7bf664fbdd">here</a>) performs local operations on 2x2 submatrices, which lends itself very well to locally-connected CORDIC-implementing PEs in a systolic array.</p><p><strong>5.2) Cholesky decomposition for symmetric positive-definite (SPD) matrices. </strong>This is a slightly easier factorization if the matrix is SPD, which comes up for example in state estimation, Kalman filtering, normal equations in interior point methods, etc. These workloads would come up in dedicated state estimation blocks in robotics pipelines. For decomposing <em>A = LL<sup>T</sup></em> with lower-triangular <em>L</em>, each PE computes one entry of L using only its left and upper neighbors, making the data dependencies purely local. This is repeated on the smaller matrix till completion.</p><p>Both of the systolic implementations referred to above use non-MAC PEs, and a triangular (not rectangular) network &#8212; this is very uncommon in current hardware, but was represented in the Kung references above.</p><p>For this post, I wanted to stick to high-level intuitive descriptions, but in the open-source <a href="https://github.com/avikde/tiny-xpu">TinyXPU project</a>, we will aim to implement and analyze some of these non-traditional systolic networks for robotics and AI pipelines. Stay tuned for the upcoming post introducing this project!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h3>Conclusion</h3><p>Systolic arrays were both invented before people think, and are more general than people think. They can have deterministic (no cache misses) <strong>high throughput and energy efficiency</strong> for algorithms which can work on local data. However, they are bad for working with sparse data (e.g. for sparse linear system solving), and bad for algorithms that need global data (e.g. Householder QR, which needs to operate on a full matrix column at a time).</p><p>In the deep neural network boom, the MAC array is so dominant in workload (&gt;95% of operations in any DNN) that the non-MAC compute takes a tiny fraction of time. Dedicating a full systolic array with <em>n<sup>2</sup></em> PEs to non-MAC operations would be area-inefficient for neural net workloads. This is why commercial vendors have not explored the co-design of systolic networks with algorithms, including PEs that can do MAC but also other functions like Givens rotations on one chip. For robotics workloads and other general scientific methods, the mix of primitives is different and (in my opinion) worth revisiting.</p><h3>Further reading</h3><ul><li><p><a href="https://www.eecs.harvard.edu/~htk/publication/1982-kung-why-systolic-architecture.pdf">Kung 1982: Why Systolic Architectures</a> - Great high-level overview of the motivation beyond systolic architectures</p></li><li><p><a href="https://swh.princeton.edu/~kung/papers_pdf/New%20Folder/VLSI%20Array%20Processors.pdf">Kung 1982: VLSA Array Processors</a> - Further detail on applications such as QR decomposition</p></li><li><p><a href="https://arxiv.org/pdf/1704.04760">Google TPU v1 paper</a></p></li></ul><p>Related posts:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;47f1c92a-14f6-478c-8016-691a6b344522&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The architecture behind &#8220;end-to-end&#8221; robotics pipelines&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-26T21:19:56.368Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!766O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/the-architecture-behind-end-to-end&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:185869291,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:18,&quot;comment_count&quot;:15,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z7FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:187337389,&quot;url&quot;:&quot;https://chipinsights.net/p/mapping-algorithms-to-custom-silicon&quot;,&quot;publication_id&quot;:2850528,&quot;publication_name&quot;:&quot;Chip Insights&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Z-fT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74222e4c-9d04-46aa-82ba-7d82759b48b9_512x512.png&quot;,&quot;title&quot;:&quot;Mapping algorithms to custom silicon - Part 1&quot;,&quot;truncated_body_text&quot;:null,&quot;date&quot;:&quot;2026-02-09T00:15:44.482Z&quot;,&quot;like_count&quot;:22,&quot;comment_count&quot;:0,&quot;bylines&quot;:[{&quot;id&quot;:178190448,&quot;name&quot;:&quot;Bharath Suresh&quot;,&quot;handle&quot;:&quot;bharathw&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23b7c14a-5bd1-4a78-9ac8-c5d6eda62bfc_2048x2048.jpeg&quot;,&quot;bio&quot;:&quot;Engineer and Writer&quot;,&quot;profile_set_up_at&quot;:&quot;2024-08-04T01:39:48.025Z&quot;,&quot;reader_installed_at&quot;:&quot;2024-09-23T00:13:37.585Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:2896802,&quot;user_id&quot;:178190448,&quot;publication_id&quot;:2850528,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:true,&quot;publication&quot;:{&quot;id&quot;:2850528,&quot;name&quot;:&quot;Chip Insights&quot;,&quot;subdomain&quot;:&quot;chipinsights&quot;,&quot;custom_domain&quot;:&quot;chipinsights.net&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Semiconductor Industry Deep Dives&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74222e4c-9d04-46aa-82ba-7d82759b48b9_512x512.png&quot;,&quot;author_id&quot;:178190448,&quot;primary_user_id&quot;:178190448,&quot;theme_var_background_pop&quot;:&quot;#9A6600&quot;,&quot;created_at&quot;:&quot;2024-08-04T01:42:57.274Z&quot;,&quot;email_from_name&quot;:&quot;Chip Insights&quot;,&quot;copyright&quot;:&quot;Bharath Suresh&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;newspaper&quot;,&quot;is_personal_mode&quot;:false,&quot;logo_url_wide&quot;:null}},{&quot;id&quot;:3076811,&quot;user_id&quot;:178190448,&quot;publication_id&quot;:3023929,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:3023929,&quot;name&quot;:&quot;Bharath&#8217;s Musings&quot;,&quot;subdomain&quot;:&quot;bharathw&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;A place for my thoughts&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/afa88b37-7ced-4dd5-bdcb-580f7442001d_608x608.png&quot;,&quot;author_id&quot;:178190448,&quot;primary_user_id&quot;:null,&quot;theme_var_background_pop&quot;:&quot;#FF6719&quot;,&quot;created_at&quot;:&quot;2024-09-16T02:30:59.184Z&quot;,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Bharath Suresh&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;newspaper&quot;,&quot;is_personal_mode&quot;:false,&quot;logo_url_wide&quot;:null}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null,&quot;status&quot;:{&quot;bestsellerTier&quot;:null,&quot;subscriberTier&quot;:null,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:null,&quot;paidPublicationIds&quot;:[],&quot;subscriber&quot;:null}},{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;handle&quot;:&quot;avikde&quot;,&quot;previous_name&quot;:&quot;Avik&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;profile_set_up_at&quot;:&quot;2025-09-01T11:05:25.762Z&quot;,&quot;reader_installed_at&quot;:&quot;2025-12-14T02:43:43.888Z&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;status&quot;:{&quot;bestsellerTier&quot;:null,&quot;subscriberTier&quot;:1,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:{&quot;type&quot;:&quot;subscriber&quot;,&quot;tier&quot;:1,&quot;accent_colors&quot;:null},&quot;paidPublicationIds&quot;:[1063960],&quot;subscriber&quot;:null},&quot;primaryPublicationId&quot;:7287367,&quot;primaryPublicationName&quot;:&quot;min{power}&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://www.avikde.me&quot;,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://www.avikde.me/subscribe?&quot;}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://chipinsights.net/p/mapping-algorithms-to-custom-silicon?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!Z-fT!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74222e4c-9d04-46aa-82ba-7d82759b48b9_512x512.png" loading="lazy"><span class="embedded-post-publication-name">Chip Insights</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Mapping algorithms to custom silicon - Part 1</div></div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">3 months ago &#183; 22 likes &#183; Bharath Suresh and Avik De</div></a></div>]]></content:encoded></item><item><title><![CDATA[Debugging as architecture insight: dissecting a VLA]]></title><description><![CDATA[Part 3: Hands-on debugging of a vision-language-action model as a lens into architecture, safety, and verifiability]]></description><link>https://www.avikde.me/p/debugging-as-architecture-insight</link><guid isPermaLink="false">https://www.avikde.me/p/debugging-as-architecture-insight</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Thu, 26 Feb 2026 15:46:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zyjp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This article is part of a series on end-to-end robotics pipelines. I&#8217;d recommend at least reading part 1 after this article.</em></p><ol><li><p><a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">The architecture behind &#8220;end-to-end&#8221; robotics pipelines</a></p></li><li><p><a href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation?r=5vzx85">Online motor adaptation</a></p></li><li><p>This article</p></li><li><p><a href="https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">Closing the action loop with a VLM &#8220;agent&#8221;</a></p></li><li><p><a href="https://www.avikde.me/p/building-a-reasoning-hierarchical">Demo combining the best features of end-to-end and classical approaches</a></p></li></ol><div><hr></div><p>In this part, we get hands-on and build a VLA pipeline from scratch. I&#8217;ll be transparent about my starting point: while I have experience with model-based methods, RL controllers, and LLMs/VLMs, generalist end-to-end policies &#8212; almost exclusively being realized today as Vision-Language-Action (VLA) models &#8212; were new territory. That makes this post a useful vantage point to evaluate their strengths and weaknesses from first principles, and should be interesting to those who have never heard of VLAs as well as those who use them daily.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>&#8220;Pick up the red block&#8221;</h3><p>The demo is simple: take a specified prompt (like the one in the heading above), run it through the model, and visualize the actions that the model outputs. Obviously, when it is run in closed loop, you would get motion that hopefully results in the action described by the prompt, but there was so much to dig into with just this visualization that it made sense to spend an article on it. In the next part, we will close the action loop and explore some of the low-level controller facets mentioned in part 1.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zyjp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zyjp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 424w, https://substackcdn.com/image/fetch/$s_!zyjp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 848w, https://substackcdn.com/image/fetch/$s_!zyjp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 1272w, https://substackcdn.com/image/fetch/$s_!zyjp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zyjp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png" width="490" height="348.4248424842484" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:790,&quot;width&quot;:1111,&quot;resizeWidth&quot;:490,&quot;bytes&quot;:245031,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6565620c-f18d-49d5-b581-9cd7f5732c26_1111x1001.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!zyjp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 424w, https://substackcdn.com/image/fetch/$s_!zyjp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 848w, https://substackcdn.com/image/fetch/$s_!zyjp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 1272w, https://substackcdn.com/image/fetch/$s_!zyjp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F933ecd9e-ec94-413d-a1a0-b8aa34d39893_1111x790.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the following animation, the configuration of the arm is changed using the sliders (while being given the same prompt), showing that the output action is responsive to the robot and environment state.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;e1d0a884-a0a4-486f-b083-02fcc52043b7&quot;,&quot;duration&quot;:null}"></div><p>The learning journey for this article is captured in a Jupyter notebook that can be accessed and run for free on colab &#8212; <a href="https://colab.research.google.com/github/avikde/vla-pipeline/blob/main/xvla_widowx_vis_traj.ipynb">click here</a>. All details on the software stack are in the <a href="https://github.com/avikde/vla-pipeline">open-source github repository for this project</a> (which is where the notebook file also is). If it is a helpful learning tool or template, I&#8217;d welcome any feedback, fixes, contributions, stars, forks, etc.</p><p>First, let&#8217;s quickly go over what a VLA is.</p><h3>Anatomy of a Vision-Language-Action (VLA) model</h3><p>A Vision-Language-Action model has three functional components: a vision encoder, a language encoder, and an action head. In practice, the vision and language encoders are almost always a single pretrained VLM, i.e. the vision and language processing are already jointly trained before the action head is added. This means the &#8220;vision encoder&#8221; and &#8220;language encoder&#8221; aren&#8217;t independently tunable modules; they&#8217;re entangled by pretraining.</p><p>The architecturally interesting variation is in how the action head attaches to the VLM, and how much of the VLM is modified during robot training. This single design choice has large downstream consequences for what you can and cannot inspect at inference time.</p><h4>VLA &#8220;action head&#8221; architectures</h4><p>Two illuminating (but not exhaustive) designs:</p><p><strong><a href="https://octo-models.github.io/">Octo</a></strong> uses a dedicated readout token &#8212; a learned embedding (~384-dim) that aggregates action-relevant information from the transformer before a small decode network produces actions. This bottleneck is the closest thing to an inspectable interface in any current VLA: you can probe whether the readout encodes directional intent, object identity, or nothing interpretable.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;43b164dd-131b-4071-b3b0-869496610567&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Transformer &#8594; readout_action embedding (384-dim)
                        &#8595;
            Action Head (direct decode)
                        &#8595;
                    Actions</code></pre></div><p><strong><a href="https://thu-air-dream.github.io/X-VLA/">X-VLA</a></strong> processes images, language, proprioception, and noisy action candidates together in a single 24-layer transformer, conditioned by 32 learnable soft prompt tokens selected per embodiment. Flow matching then iteratively refines the action chunk over 10 steps. Action-relevant information is distributed across all layers and token types simultaneously.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;4d5e3b02-48c6-4434-8b2d-3edee2a6b173&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Input: Images + Language + Proprio + Domain ID
               &#8595;
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;  Soft Prompt Selection (per embodiment)   &#9474;
&#9474;  Domain 0 &#8594; Prompt_0 (32 learnable tokens)&#9474;
&#9474;  Domain 1 &#8594; Prompt_1 (32 learnable tokens)&#9474;
&#9474;  Domain N &#8594; Prompt_N (32 learnable tokens)&#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
               &#8595;
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;  Unified Transformer Stack (24 layers)   &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;  &#9474;
&#9474;  &#9474; [Soft Prompt | Vision | Lang |     &#9474;  &#9474;
&#9474;  &#9474;  Proprio | Noisy Actions]          &#9474;  &#9474;
&#9474;  &#9474;                                    &#9474;  &#9474;
&#9474;  &#9474;  All processed together with       &#9474;  &#9474;
&#9474;  &#9474;  standard self-attention           &#9474;  &#9474;
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;  &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
                &#8595;
       Flow Matching (10 steps)
                &#8595;
       Action Chunk (32 actions)</code></pre></div><p>The soft prompts enable efficient cross-embodiment adaptation: only ~9M parameters (1% of the model) need updating for a new robot. But they also mean embodiment-specific behavior is encoded in vectors with no interpretable structure.</p><p>The deeper point applies to both architectures: even where a vector interface exists between components (Octo&#8217;s readout token, X-VLA&#8217;s soft prompts), end-to-end training means those vectors don&#8217;t have a physical interpretation that safety constraints can be applied to.</p><p>There is more to be said on action chunking and control bandwidth, which I&#8217;ll plan to do in the next part of the series.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><p></p><h4>Model choice</h4><p>I approached this as a user rather than a researcher: published weights only, no training data collection, and no fine-tuning iterations before deployment. The target application is tabletop pick-and-place on a WidowX, which is a common manipulation benchmark and exposes the control and perception properties I care about. Another soft constraint was that I&#8217;d be able to run it on my personal laptop (12GB VRAM).</p><p>These three criteria limit which VLAs can be tried. <a href="https://huggingface.co/openvla/openvla-7b">OpenVLA-7B</a> requires task-specific fine-tuning and won&#8217;t fit in 12GB without quantization. <a href="https://huggingface.co/docs/lerobot/en/pi0">&#960;0</a> needs 24GB+. <a href="https://github.com/NVIDIA/Isaac-GR00T/blob/main/getting_started/hardware_recommendation.md">GR00T</a> requires a Jetson Thor. <a href="https://deepmind.google/blog/gemini-robotics-on-device-brings-ai-to-local-robotic-devices/">Gemini Robotics On-Device</a> is trained on dual-arm configurations and isn&#8217;t publicly accessible. Octo (93M params) fits the hardware but its pretraining doesn&#8217;t support zero-shot transfer without fine-tuning. <a href="https://huggingface.co/docs/lerobot/en/smolvla">SmolVLA</a> likewise requires fine-tuning.</p><p>X-VLA seems to fit the bill. Its soft-prompt architecture was designed for cross-embodiment zero-shot transfer, and <a href="https://huggingface.co/lerobot/xvla-widowx">xvla-widowx</a> provides a checkpoint fine-tuned on BridgeData for the WidowX embodiment specifically, meaning embodiment adaptation is handled, while task generalization remains zero-shot. It also has a <code>ee6d</code> (end-effector coordinates) action output mode, which appealed to me because it would allow elimination of kinematics-related variability.</p><h3>What&#8217;s different about VLAs: task programming</h3><p>VLAs have been heralded as revolutionary for robotics, and it&#8217;s true: the prospect of robot programming with natural language is a decided shift. Thinking about my own fielded robotics experience at Ghost Robotics, either customers would (a) directly command the robot, (b) pick between preprogrammed tasks (which can be considered a fixed small vocabulary of commands), or the robot would start its own tasks. Giving natural language commands increases the set of tasks <em>without retraining or reprogramming</em>. The natural language interface changes <em>who</em> can program a robot, not just <em>what</em> it can do. With a VLA, a non-technical operator can in principle specify novel tasks.</p><p>The flip side worth mentioning fairly: natural language as an interface trades a small precise vocabulary (preprogrammed tasks) for a large ambiguous one. &#8220;Pick up the red block&#8221; sounds more expressive than running the &#8220;pick_red&#8221; preprogrammed task, but as the next section will show, the boundary of what the model actually understands is opaque in a way that a fixed command vocabulary is not.</p><h3>What&#8217;s different about VLAs: calibration and debugging</h3><p>With classical methods, the process of setting up and debugging a task includes several well-delineated steps:</p><ul><li><p>calibrate cameras &#8594; check camera detection overlay &#8594; perception &#9989;</p></li><li><p>calibrate joints &#8594; send arm &#8220;move up&#8221; command and ensure it moves as expected &#8594; actuators &#9989;</p></li></ul><p>With VLAs, there are a few reasons why this kind of unit testing or debugging is simply not possible. </p><ol><li><p>Camera extrinsics or joint torque constant parameters will not be isolated: datasets are typically trained with multiple camera angles without explicit calibration, and network learns spatial transforms end-to-end.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> Another example: swapping the camera lens for a fisheye for a wider FOV won&#8217;t generalize without retraining, unlike traditional vision where you just recalibrate intrinsics.</p></li><li><p>There aren&#8217;t obvious equivalents of non-end-to-end interfaces such as the camera detection overlay or a &#8220;move up&#8221; command, but we will try to come up with methods to work around this in the next section.</p></li><li><p>Randomness: the trajectory will vary with no environmental change. Flow matching stochasticity is in the action head specifically; the VLM backbone is deterministic given the same input. X-VLA uses 10-step flow matching. Even with same seed, numerical precision in GPU ops causes drift by step 5-6</p></li></ol><p>Due to a combination of 1 and 2 (and slightly exacerbated by 3), it can be complex to reason about the root cause of a failure. Is failure due to (a) vision error, (b) action discretization error, (c) world model mismatch, or (d) all three? When can you dismiss a failure as being out of distribution vs. not?</p><p>For a developer, this ambiguity is an inconvenience; with enough time, you can run more experiments and form hypotheses (as we do in the next section). For a deployed system in customer hands, the same ambiguity becomes a safety property: the robot has no reliable mechanism to detect that it is out of distribution and should stop. Classical systems fail loudly (joint limit hit, object not detected, planner infeasible); VLAs fail silently, producing plausible-looking but wrong trajectories. This isn&#8217;t a criticism of VLAs specifically, but it is a structural consequence of end-to-end training, and it applies equally to any system where the failure boundary is defined implicitly by a training distribution rather than explicitly by an engineer.</p><h3>VLA debugging ideas and techniques</h3><p>Despite the structural challenges mentioned above, I had a fascinating experience coming up with ways to probe and understand what the VLA was doing. </p><h4>Passive debugging: inspect what the model is already computing</h4><ol><li><p><strong>Interpret VLM output (infeasible). </strong>My first instinct was to query the VLM backbone directly, e.g. by asking something like &#8220;Is there a red cube?&#8221; or &#8220;What objects are on the table?&#8221; to verify perception. This turns out not to be feasible for most architectures. In X-VLA and SmolVLA, the action head attaches to the VLM&#8217;s final hidden states and generates actions through flow matching in a continuous space, bypassing the text vocabulary entirely. You could query the underlying base VLM (e.g. <a href="https://huggingface.co/blog/smolvla#vision-language-model-vlm">SmolVLM2 for SmolVLA</a>) separately, but that&#8217;s not a fair proxy: fine-tuning on robot manipulation data shifts the VLM&#8217;s internal representations, so its text generation behavior no longer reflects what the VLA backbone actually sees. This technique only works cleanly in text-token VLAs like <a href="https://robot-learning-collective.github.io/vla-0-smol">VLA-0-Smol</a>, where actions are generated as autoregressive text strings from the same output head as language. There, scene description quality and action quality share a representation and if the model produces a poor scene description, it will likely produce poor action tokens.</p></li><li><p><strong>Visualize attention on tokens.</strong> The ubiquity of transformer-based architectures means that we can leverage the <a href="https://huggingface.co/docs/transformers/en/model_doc/encoder-decoder">HuggingFace transformer&#8217;s output_attentions</a> feature to try to visualize where the vision and text encoders are spending their attention, and if it is appropriate for the task specified. E.g. if we ask it to pick up a red block, is the vision encoder indeed looking at the red block?</p></li></ol><h4>Active debugging: intervene on inputs and observe behavioral change</h4><ol><li><p><strong>Camera ablations (test whether vision is doing object detection or spatial template matching).</strong> Move the camera position, and introduce occlusions into one of the views if there are multiple. If attentions fail to track the desired object, it suggests the model learned spatial heuristics tied to camera geometry rather than object identity. In a classical pipeline, object detection is camera-pose-invariant by design (you&#8217;d re-project into robot frame), but here, camera pose is baked into the learned policy implicitly through the training distribution.</p></li><li><p><strong><a href="https://www.emergentmind.com/topics/counterfactual-prompt-design">Counterfactual prompting</a> to test semantic understanding.</strong> Use variations of the prompt (e.g. red block vs. red cube) that effectively mean the same thing and observe if the output stays consistent. Different outputs exposes that the action head is sensitive to tokenization differences that the VLM alone would smooth over. Also, </p></li><li><p><strong>Primitive action prompts (tests action head&#8217;s semantic understanding of motion).</strong> E.g. if &#8220;don&#8217;t move&#8221; produces as much motion as &#8220;pick up block&#8221;, it shows that the action head is always generating motion from its training distribution, v.s. containing a deeper understanding of what motion is.</p></li></ol><p>I suspect that some (if not all) of these will be familiar to seasoned VLA users, but please let me know in the comments if you&#8217;re aware of a better technique &#8212; chances are that it will many prospective and current VLA users!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/debugging-as-architecture-insight/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.avikde.me/p/debugging-as-architecture-insight/comments"><span>Leave a comment</span></a></p><h3>Debugging results</h3><p>For each experiment, I&#8217;ll write what a reasonable expectation would be, the result we see, and the resulting insight or the deeper reason why.</p><h4>Baseline: pick up the red block</h4><p>In this baseline, the attention mask on the image looks like it is looking at the red block as well as the gripper. The reaching trajectory output looks like it moves to directly over the red block. Overall, this looks to be a great initial result.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NfNw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NfNw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!NfNw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!NfNw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!NfNw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NfNw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/adcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:260352,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!NfNw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!NfNw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!NfNw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!NfNw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcbd2bb-f044-4699-a474-22af74e2f880_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Experiment 1: Picking a different block in view</h4><p><strong>Expectation: </strong>Symmetric action based on spatial understanding from multiple views</p><p><strong>Result: </strong>The visualized attention shows that it is looking at approximately the correct part of the primary image, though it appears a little offset to the outside of the block. The reaching action appears to not reach as far toward the blue block.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tsjO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tsjO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!tsjO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!tsjO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!tsjO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tsjO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:263368,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!tsjO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!tsjO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!tsjO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!tsjO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97c36d-d997-4766-b7e3-29eb24169ee8_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>Most likely, the &#8220;3D&#8221; spatial understanding from the images is not exactly what we would expect from an exactly calibrated perception and object identification setup.</p><h4>Experiment 2: Swap blue / red positions</h4><p><strong>Expectation: </strong>Symmetric behavior from previous experiment.</p><p><strong>Result: </strong>Blue trajectory overshoots more compared to initial red block trajectory, and red trajectory overshoots.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sCNd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sCNd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!sCNd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!sCNd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!sCNd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sCNd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:264576,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!sCNd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!sCNd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!sCNd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!sCNd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeef462f-30e6-4c92-b09a-10906c1f3f17_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UUKw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UUKw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!UUKw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!UUKw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!UUKw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UUKw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:264223,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!UUKw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!UUKw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!UUKw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!UUKw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca562d78-5ab6-467a-a087-c694999a2e63_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>Spatial understanding and behavior is not symmetric when it is expected to be, indicating a bigger effect of things like training data distribution.</p><h4>Experiment 3: Altered primary camera view</h4><p><strong>Expectation: </strong>Same behavior as the initial camera view.</p><p><strong>Result: </strong>The red block trajectory now exhibits the under-reaching from the blue trajectory before, and vice versa.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e3lu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e3lu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!e3lu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!e3lu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!e3lu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e3lu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:249229,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!e3lu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!e3lu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!e3lu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!e3lu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9873f6b2-a286-4e5b-8a72-566ab21aaa04_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dKBg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dKBg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!dKBg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!dKBg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!dKBg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dKBg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:249035,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dKBg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!dKBg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!dKBg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!dKBg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39edf20-dfc4-4b4c-80e4-6d3676157a5c_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>The actions are inseparably tied to the camera view and not associated with absolute spatial understanding.</p><h4>Experiment 4: Remove second camera view</h4><p><strong>Expectation: </strong>Slight degradation in performance.</p><p><strong>Result: </strong>Removing the side view has minimal effect, but removing the over-the-shoulder view has a disastrous effect on performance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iFFG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iFFG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!iFFG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!iFFG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!iFFG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iFFG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d6d97802-2a8f-4f16-b595-86523141105c_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262220,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!iFFG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!iFFG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!iFFG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!iFFG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6d97802-2a8f-4f16-b595-86523141105c_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XlS0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XlS0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!XlS0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!XlS0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!XlS0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XlS0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262235,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!XlS0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!XlS0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!XlS0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!XlS0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff13e5ffa-ec4c-4513-936e-f18d9586fce0_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>It appears that <a href="https://rail-berkeley.github.io/bridgedata/">BridgeData</a> has a disproportionately high number of trials with the over-the-shoulder view and significantly altered view points may silently produce much worse results. </p><h4>Experiment 5: Occluded primary view</h4><p><strong>Expectation: </strong>Second view provides redundancy.</p><p><strong>Result: </strong>Trajectory moves away from the red block.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iEYg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iEYg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!iEYg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!iEYg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!iEYg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iEYg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262599,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!iEYg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!iEYg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!iEYg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!iEYg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b620c77-0e4d-47d8-97df-6378e937180c_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>The side camera view seems to not be useful in X-VLA.</p><h4>Experiment 6: Prompt variations</h4><p><strong>Expectation: </strong>Similar-meaning prompts will produce similar actions.</p><p><strong>Result: </strong>All these similar prompts largely resulted in similar actions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b8o3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b8o3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!b8o3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!b8o3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!b8o3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b8o3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/baf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262707,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!b8o3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!b8o3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!b8o3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!b8o3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf37394-fdb2-4be7-978c-7610aa8ef36d_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8733!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8733!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!8733!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!8733!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!8733!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8733!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262482,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8733!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!8733!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!8733!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!8733!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8bf613-ea51-4096-8daa-26bd151bcb10_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UxGP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UxGP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!UxGP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!UxGP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!UxGP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UxGP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:263062,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UxGP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!UxGP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!UxGP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!UxGP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2be7fc4-d04d-4b1e-a92a-9e1ea7b99441_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IeOo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IeOo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!IeOo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!IeOo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!IeOo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IeOo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262328,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IeOo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!IeOo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!IeOo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!IeOo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031cb99c-685f-422e-afe0-b61e4ba4ff31_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>The language encoder is effective at collapsing equivalent prompts to the same actions.</p><h4>Experiment 7: Don&#8217;t move</h4><p><strong>Expectation: </strong>No motion.</p><p><strong>Result:</strong> Approximately as much motion as when asked to pick the red cube with the left shoulder view.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qWia!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qWia!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!qWia!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!qWia!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!qWia!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qWia!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:261100,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qWia!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!qWia!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!qWia!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!qWia!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce8682a-2134-4abd-abd2-58926513d869_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>the model is still interpolating / extrapolating from training samples and does not have an explicit understanding of commands.</p><h4>Experiment 8: Change picking position</h4><p><strong>Expectation: </strong>The output trajectory moves to the modified block position.</p><p><strong>Result: </strong>The visual attention is strangely not on the block in the second example, but largely, the trajectory is responsive to the environment change.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Dn7O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Dn7O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!Dn7O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!Dn7O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!Dn7O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Dn7O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:258635,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Dn7O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!Dn7O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!Dn7O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!Dn7O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4844314-4e4e-432a-83aa-06f120e0245a_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PAWb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PAWb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!PAWb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!PAWb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!PAWb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PAWb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:261359,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!PAWb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!PAWb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!PAWb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!PAWb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51daee6b-2412-41af-83a2-20f1b1bcb3d2_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>As long as the prompt is visually grounded, the results generalize in the expected way. Soft prompt for WidowX likely encodes &#8220;approach visible object&#8221; as primitive (trained on Bridge dataset).</p><h4>Experiment 9: Move forward / backward / up / down</h4><p><strong>Expectation:</strong> Move as asked.</p><p><strong>Result: </strong>Approximately the same motion toward the tabletop, largely uncorrelated with the prompt.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N4k4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N4k4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!N4k4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!N4k4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!N4k4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N4k4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:261575,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N4k4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!N4k4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!N4k4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!N4k4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71acc4f5-b604-4b12-aedf-e9dd5b7f087e_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EiXO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EiXO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!EiXO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!EiXO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!EiXO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EiXO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:265645,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EiXO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!EiXO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!EiXO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!EiXO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1166fd0-6293-4717-8223-e1b1fc197a88_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dNob!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dNob!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!dNob!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!dNob!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!dNob!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dNob!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:262285,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dNob!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!dNob!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!dNob!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!dNob!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ddb5a3-c2c1-442e-8f35-8fc473bd5043_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>No visual grounding for blind motions. The model has no spatial primitive vocabulary because VLMs are trained on image-caption pairs where &#8220;up&#8221; describes scene composition, not robot workspace direction.</p><h4>Experiment 10: Move toward / away from base</h4><p><strong>Expectation: </strong>Move as instructed.</p><p><strong>Result: </strong>Discernible difference in the two trials accordingly, suggesting some comprehension of the prompt.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gCGK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gCGK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!gCGK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!gCGK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!gCGK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gCGK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:261494,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gCGK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!gCGK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!gCGK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!gCGK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9583fb-8745-44d5-9dab-c1ecb785873a_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B8R1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B8R1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!B8R1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!B8R1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!B8R1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B8R1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:266299,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B8R1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!B8R1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!B8R1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!B8R1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08c4004-e3fa-40cf-9ba4-567351a6135e_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>The introduction of the robot base as a (visible) target makes things significantly easier for the model compared to the previous experiment.</p><h4>Experiment 11: Move away from block</h4><p><strong>Expectation: </strong>Motion away from the block.</p><p><strong>Result: </strong>Motion largely toward the tabletop.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5mc5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5mc5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!5mc5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!5mc5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!5mc5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5mc5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png" width="1428" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:266131,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188827303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5mc5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 424w, https://substackcdn.com/image/fetch/$s_!5mc5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 848w, https://substackcdn.com/image/fetch/$s_!5mc5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 1272w, https://substackcdn.com/image/fetch/$s_!5mc5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09d3aa04-81cf-4a3b-827d-081f3b8f0747_1428x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Insight: </strong>The word &#8220;away&#8221; is probably not having the spatial effect that it should in this context, exposing the ambiguity inherent in using language for robot programming. Whether we like it or not, at least unless the language model is huge, it is safer to assume that the prompt effectively indexes or extrapolates among training data, and that positional prepositions (commonly used by humans to communicate spatial commands) are not reliable to use.</p><h3>What the experiments reveal about current VLAs</h3><p><strong>1. Camera view is tied to the behavior, not a calibrated parameter.<br></strong>Experiments 2, 3, 4, and 5 collectively show that the model&#8217;s spatial behavior is tied to the training distribution&#8217;s camera geometry rather than to a camera-pose-invariant object representation. Swapping shoulders changes reach distance; replacing the over-shoulder view with a side view breaks the policy entirely even though the scene is identical. This is a consequence of any VLA trained end-to-end without explicit camera calibration. The practical implication is that deployment requires camera placement matching the training distribution, and the model will fail silently when out of distribution.</p><p><strong>2. The action manifold is object-centric, not spatially general.<br></strong>Experiments 7, 9, 10, and 11 collectively show that the model has no spatial primitive vocabulary independent of objects. &#8220;Move up/forward/back&#8221; all produce similar grasping-like motions; &#8220;don&#8217;t move&#8221; produces motion; &#8220;move away from block&#8221; produces motion toward the block. &#8220;Move toward/away from base&#8221; works only because the base is a visually grounded object in the scene. This generalizes beyond X-VLA: any VLA at this scale trained predominantly on pick-and-place demonstrations will have an action manifold that approximates &#8220;move toward salient object and grasp.&#8221; Spatial relation commands only work when they can be reduced to object identity. This has a direct safety implication: you cannot issue a recovery command (&#8221;stop,&#8221; &#8220;move away,&#8221; &#8220;back off&#8221;) and expect it to override the trained behavioral prior.</p><p><strong>3. VLAs at this scale appear to lack compositional generalization.</strong><br>Experiments 7, 9, and 11 show that novel combinations of spatial primitives and objects (even using vocabulary the model demonstrably knows) produce behavior dominated by the training distribution rather than the instruction. This is distinct from the question of whether larger VLAs generalize better, which is likely true, but out of scope for this article. But it does suggest that for sub-1B parameter VLAs, natural language commands are most reliable when they closely match the task distribution the model was trained on, which significantly narrows the practical definition of "zero-shot generalization" for deployment.</p><h3>Closing thoughts</h3><p>For flow-matching VLAs like X-VLA, the classical debugging question &#8220;is this a vision problem or a control problem?&#8221; is not just difficult to answer but structurally unanswerable. End-to-end training eliminates the interfaces that would make the question meaningful.</p><p>The debugging ideas presented here offer partial remedies: passive inspection via attention visualization and active intervention via camera ablations and language variation. These experiments also surfaced three concrete findings: spatial understanding is tied to training-distribution camera geometry rather than calibrated object pose; the action manifold is object-centric and lacks spatial primitive vocabulary; and compositional generalization breaks down for novel combinations of known concepts. These are echoes of the <a href="https://open.substack.com/pub/aisnakeoil/p/new-paper-towards-a-science-of-ai?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">reliability concerns of consistency, robustness, predictability, and safety</a> that are crucially important to evaluate robotics progress.</p><p>None of this diminishes what VLAs actually deliver &#8212; flexible task programming and meaningful robustness to environmental variation, without any robot-specific programming. The path to reliable deployment is augmenting the strengths of VLAs with explicit interfaces for safety constraints, reducing complexity by utilizing known tools for camera and kinematics calibration, and out-of-distribution detection.</p><p>In the next part, we will close the loop with this demo&#8217;s action outputs to try and leverage the strengths of VLAs in conjunction with low-level control ideas from parts 1 and 2.</p><p>If you liked this kind of analysis, please subscribe for future posts, and thanks for reading!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>In fact, as mentioned above, the trajectory start point being slightly variable makes me suspect some error but it&#8217;s quite difficult to narrow down further, even after checking <a href="https://github.com/2toinf/X-VLA?tab=readme-ov-file#5%EF%B8%8F%E2%83%A3-standardized-control-interface-ee6d">the documentation</a> and opening <a href="https://huggingface.co/lerobot/xvla-widowx/discussions/2">an issue</a>. However, this isn&#8217;t a fundamental VLA issue and I&#8217;m going to put it aside for this article.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[What Wiener knew about (artificial) intelligence in 1948]]></title><description><![CDATA[Cybernetics anticipated feedback, structure, and the human stakes of machine intelligence with unsettling precision]]></description><link>https://www.avikde.me/p/what-wiener-knew-about-artificial</link><guid isPermaLink="false">https://www.avikde.me/p/what-wiener-knew-about-artificial</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Sat, 21 Feb 2026 16:00:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!cQgM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As evidenced by my <a href="https://www.avikde.me/p/what-von-neumann-understood-about">prior post on von Neumann</a>, I believe it&#8217;s crucial to integrate historical context and cross-disciplinary knowledge at this pivotal period of technological change. It was recommended to me that I read Norbert Wiener&#8217;s <em>Cybernetics</em>, published even earlier and another pillar in the founding moment of the information age.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cQgM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cQgM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 424w, https://substackcdn.com/image/fetch/$s_!cQgM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 848w, https://substackcdn.com/image/fetch/$s_!cQgM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 1272w, https://substackcdn.com/image/fetch/$s_!cQgM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cQgM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp" width="600" height="450" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Norbert Wiener, matem&#225;tico fundador de la cibern&#233;tica.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Norbert Wiener, matem&#225;tico fundador de la cibern&#233;tica." title="Norbert Wiener, matem&#225;tico fundador de la cibern&#233;tica." srcset="https://substackcdn.com/image/fetch/$s_!cQgM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 424w, https://substackcdn.com/image/fetch/$s_!cQgM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 848w, https://substackcdn.com/image/fetch/$s_!cQgM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 1272w, https://substackcdn.com/image/fetch/$s_!cQgM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277ff110-b9e0-4314-b186-97335adb0a69_600x450.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Norbert Wiener, the founder of cybernetics (<a href="https://loff.it/society/efemerides/norbert-wiener-matematico-fundador-de-la-cibernetica-216189/">Image source</a>)</figcaption></figure></div><p>Wiener was a prodigious child, receiving a PhD by age 18 from Harvard, and becoming MIT mathematics faculty. By the account of <em>Dark Hero of the Information Age</em>, the biography by Flo Conway and Jim Siegelman, he was simultaneously one of the most intellectually alive and emotionally turbulent figures in twentieth-century science: touched by manic-depressive episodes and collegial feuds, yet capable of a mathematical breadth that few of his contemporaries could match.</p><p>That breadth is visible in the book he published in 1948: <em>Cybernetics, or Control and Communication in the Animal and the Machine</em>. Its thesis was that information flow and message-passing are central to control and communication in both animals and machines. It appeared the same year as Shannon&#8217;s &#8220;A Mathematical Theory of Communication&#8221; and the year before Shockley&#8217;s transistor paper. Wiener was at the center of the founding of the information age, and yet he has been largely forgotten in the recent technological development. His legacy was overshadowed by Shannon, who had the more implementable theory, and by von Neumann, who had the more implementable architecture.</p><p>Reading <em>Cybernetics</em> now, almost 80 years later, is awe-inspiring and unsettling in equal measure. It is mathematically dense in places and dated in others, but the program it laid out is strikingly relevant to modern AI development. Here are the ideas from it that I found most resonant.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>1. Feedback</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YoNn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YoNn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 424w, https://substackcdn.com/image/fetch/$s_!YoNn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 848w, https://substackcdn.com/image/fetch/$s_!YoNn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 1272w, https://substackcdn.com/image/fetch/$s_!YoNn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YoNn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png" width="404" height="336.003937007874" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:845,&quot;width&quot;:1016,&quot;resizeWidth&quot;:404,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;undefined&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="undefined" title="undefined" srcset="https://substackcdn.com/image/fetch/$s_!YoNn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 424w, https://substackcdn.com/image/fetch/$s_!YoNn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 848w, https://substackcdn.com/image/fetch/$s_!YoNn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 1272w, https://substackcdn.com/image/fetch/$s_!YoNn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640db9b2-c4c4-404e-9344-aeca73b78c80_1016x845.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Watt&#8217;s flyball governor (<a href="https://en.wikipedia.org/wiki/Centrifugal_governor">image source</a>)</figcaption></figure></div><p>&#8220;Cybernetics&#8221; originates from the Ancient Greek <strong>&#954;&#965;&#946;&#949;&#961;&#957;&#942;&#964;&#951;&#962; (kybern&#275;t&#275;s)</strong>, meaning &#8220;steersman&#8221; &#8212; the same root that, via Latin, gave us the word &#8220;governor.&#8221; It is perhaps not coincidence that Maxwell&#8217;s <a href="https://www.jstor.org/stable/112510">paper on governors</a> was the first known exposition on feedback control. I don&#8217;t need to elaborate on the value of feedback in modern technology, but two nontrivial leaps Wiener makes are worth highlighting.</p><p>First, he draws a connection between communication and control in neurology. The feedback loop (sense the error, apply a correction, repeat) describes voluntary movement in biological systems. When this feedback is damaged, as in cerebellar injury, the result is a tremor or oscillation: too aggressive a correction followed by too aggressive a counter-correction. This convergence of engineering control theory and neurology was a founding observation of cybernetics: the same mathematics governs servomechanisms and nervous systems.</p><p>The second leap is the identification of a fundamental tradeoff: <strong>do you invest in modeling or in feedback?</strong> Wiener&#8217;s answer depends on how constant and knowable your system is. He called systems that leverage explicit models <em>compensators</em>, contrasting them with pure feedback mechanisms.  In today&#8217;s terms, Wiener&#8217;s compensator needs a world model: an internal representation of how the system behaves that allows action without waiting for error to accumulate. The model vs. feedback tradeoff he identified has strong echoes of the one playing out now in the debate between <a href="https://www.avikde.me/p/the-ai-world-models-debate-and-its">scaling-based and structured AI architectures</a>, not to mention <a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">in robotics</a>. Model-free reinforcement learning is a direct descendant of the feedback side of this tradeoff: an agent interacts with an environment, receives a reward signal reflecting the gap between its behavior and a desired outcome, and adjusts its policy accordingly.</p><h3>2. Neuron structure: digital vs. analog</h3><p>Wiener asks in the book: in what ways are the computational substrates of brains and machines alike, and in what ways are they fundamentally different?</p><p>Wiener&#8217;s first observation is that neurons obey an &#8220;all-or-none&#8221; law (they fire fully or not at all) and in this sense are digital. This is in tension with von Neumann&#8217;s later analysis, covered in a <a href="https://www.avikde.me/p/what-von-neumann-understood-about">prior post</a>: von Neumann argued that individual neurons function more like small analog computers, with temporal dynamics and nonlinear integration beyond what a simple threshold element can do. The understanding of neuronal computation has deepened considerably since both accounts, and the honest answer is that neurons are neither purely one nor the other.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3zC7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3zC7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 424w, https://substackcdn.com/image/fetch/$s_!3zC7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 848w, https://substackcdn.com/image/fetch/$s_!3zC7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 1272w, https://substackcdn.com/image/fetch/$s_!3zC7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3zC7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png" width="550" height="373.4113712374582" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:609,&quot;width&quot;:897,&quot;resizeWidth&quot;:550,&quot;bytes&quot;:111767,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188648218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3zC7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 424w, https://substackcdn.com/image/fetch/$s_!3zC7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 848w, https://substackcdn.com/image/fetch/$s_!3zC7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 1272w, https://substackcdn.com/image/fetch/$s_!3zC7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c3d404c-2574-45b9-8357-17e71dafdbdb_897x609.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">McCulloch-Pitts neuron models (1943) referenced by Wiener</figcaption></figure></div><p>What followed from the digital view, at least in the engineering tradition, was eventually deep learning: stack enough simple threshold units to sufficient depth, and powerful computation emerges. But as the next sections argue, Wiener himself was skeptical that the generic stacking of simple units was sufficient.</p><h3>3. Neuron organization: flexible vs. dedicated</h3><p>What Wiener does not dispute is that even if neurons are digital in their firing, their <em>organization</em> is anything but generic. He writes: </p><blockquote><p>The structure of our visual cortex is too highly organized, too specific, to lead us to suppose that it operates by what is after all a highly generalized mechanism.</p></blockquote><p>As a mathematician, he frames this in terms of group theory: the visual system is built to be invariant under transformations of position, rotation, scale, and illumination. Image recognition is comparison at the level of structural properties that persist across transformations, and not comparison of photoreceptor signals. The retina has broadly distributed and low-resolution rod cells and foveally-concentrated cones, and layers beyond it extracting features at multiple spatial frequencies in parallel. Structure encoded by biology is doing work from the very first stage.</p><p>The unifying point is that the brain does not apply a general-purpose function to raw sensory data and let structure emerge. It applies a pipeline in which each stage is specifically organized to extract the right kind of information. Most modern vision models posit that this structure will emerge from scale and data; capsule networks, group-equivariant CNNs etc. attempt to encode it explicitly but remain outside the mainstream. This is the same tension at the heart of the world models debate: whether sufficient scale applied to a general architecture will recover the structure that biology built in deliberately, or whether that structure ought to be encoded.</p><h3>4. The switchboard analogy</h3><p>Wiener is next interested in how neurons are organized, and here his analysis diverges sharply from the digital computer model he was comparing against.</p><p>A digital computer of his era had specific circuits for specific operations: an adder, a multiplier, a comparator, each doing one thing reliably and repeatedly. The brain, he argues, does not work this way. Rather than dedicated permanent circuits, the brain reconfigures its functional connections dynamically, routing signals through different pathways depending on context. He uses the telephone switchboard as his analogy: the same physical wires serve different conversations depending on how the exchange is configured at any moment.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!w04J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!w04J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 424w, https://substackcdn.com/image/fetch/$s_!w04J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 848w, https://substackcdn.com/image/fetch/$s_!w04J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 1272w, https://substackcdn.com/image/fetch/$s_!w04J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!w04J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png" width="805" height="201" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:201,&quot;width&quot;:805,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:79768,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188648218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc01eaa6-0907-4e03-8d09-8cd53e6404b0_898x246.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!w04J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 424w, https://substackcdn.com/image/fetch/$s_!w04J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 848w, https://substackcdn.com/image/fetch/$s_!w04J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 1272w, https://substackcdn.com/image/fetch/$s_!w04J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4075b01e-2f43-4335-9c52-e73c5e521fc8_805x201.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">An image is formed on the photocells (bottom), but then flexibly connected to processing at different size scales (dotted lines). Original image source: <em>Cybernetics.</em></figcaption></figure></div><p>He makes this concrete with a visual processing example depicted above: recognizing a letter regardless of its size (a large &#8220;A&#8221; and a small &#8220;A&#8221;) with a fixed array of photocells. His proposed solution uses a switchable connection layer between the photocell array (bottom) and a fixed set of processing elements (top). By selecting different connection patterns (the diagonal lines), photocell activations at different scales get mapped onto the same processing elements, achieving scale invariance through reconfigurable routing rather than through a learned function.  In deep learning perception, this is similar to ideas like spatial pyramid pooling or adaptive pooling.</p><p>In contrast, a vision transformer applies the same operation at every layer to every token, with flexibility coming entirely from learned weights at massive scale. There is no dynamic routing or reconfiguration based on the nature of the input. Wiener pointed out that this approach carries a cost: a large fixed architecture must run in its entirety even when most of it is irrelevant to the current input. A 175B parameter model processing a simple query still activates the full machinery, paying the energy and latency cost of elements that contribute nothing to that particular computation.</p><p>Some modern work moves toward Wiener&#8217;s direction. Mixture-of-experts architectures route inputs to specialized sub-networks rather than running everything; sparse transformers use dynamic attention patterns; early-exit networks use only as much compute as the input requires. These remain the exception rather than the rule, but they are each, in a real sense, implementations of the switchboard principle Wiener described in 1948.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><p></p><h3>5. Spatial efficiency: foveation</h3><p>A thread running through Wiener&#8217;s treatment of vision is that the brain achieves capable perception not by processing everything uniformly and in parallel, but by being strategically non-uniform in both space and time.</p><p>The spatial side is foveation. The fovea provides high-resolution detail while the periphery offers broad, low-resolution motion detection. The brain doesn&#8217;t passively receive a full image, it actively steers the fovea toward informative regions via saccades, driven by a continuous feedback loop. The implication is that high-resolution processing is a scarce resource allocated dynamically, not applied uniformly.</p><h3>6. Temporal efficiency: the television analogy</h3><p>The temporal side is more surprising. Wiener observes that the brain may serialize what would otherwise require parallel hardware, using alpha waves (the ~10 Hz electrical rhythms visible in EEGs) as a scanning clock. Just as a television converts a two-dimensional image into a sequential stream by sweeping line by line, the brain may sweep through its representational space cyclically, interrogating stored patterns at each clock cycle. The efficiency principle is time-multiplexing: reuse the same hardware over time rather than duplicate it in space.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IyQF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IyQF!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 424w, https://substackcdn.com/image/fetch/$s_!IyQF!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 848w, https://substackcdn.com/image/fetch/$s_!IyQF!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 1272w, https://substackcdn.com/image/fetch/$s_!IyQF!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IyQF!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif" width="400" height="342" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:342,&quot;width&quot;:400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IyQF!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 424w, https://substackcdn.com/image/fetch/$s_!IyQF!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 848w, https://substackcdn.com/image/fetch/$s_!IyQF!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 1272w, https://substackcdn.com/image/fetch/$s_!IyQF!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108cfeb-8554-4d19-ab78-acc0aa4ddc3e_400x342.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Progressive scanning in a TV (<a href="https://msys-mv.blogspot.com/2010/11/understanding-basic-requirements-to.html">image source</a>)</figcaption></figure></div><p>Together these describe a coherent alternative to the architecture modern AI has converged on. Transformers process all positions in a spatially uniform and temporally instantaneous manner, which is expensive in both compute and energy. Biology does neither: it allocates spatial resolution selectively and serializes computation over time. Foveation-inspired architectures (glimpse networks, recurrent attention models) and ideas like conditional computation point in this direction but remain outside the mainstream, largely because uniform dense operations map cleanly onto GPU hardware. Wiener&#8217;s architectural intuitions may become increasingly relevant if the AI energy crisis makes the efficiency argument more economically compelling.</p><h3>7. Avoiding blunders: redundancy and verification</h3><p>The brain produces behavior of remarkable precision despite individual neurons being surprisingly unreliable: they fire spontaneously, transmit probabilistically, and have far worse signal-to-noise ratios than transistors. Wiener&#8217;s answer, developed in the psychopathology chapter, is that there are two complementary strategies for error correction. The first is the &#8220;<a href="https://englishverse.com/poems/the_hunting_of_the_snark">what I tell you three times is true</a>&#8221; strategy: running two or three computing mechanisms simultaneously on the same problem, so that errors can be recognized by agreement across parallel channels. The second is backtracking: sequential verification where the system checks its own output and revises when something goes wrong. One is spatial (parallel redundancy), the other is temporal (serial correction) &#8212; the same tradeoff from before, now applied to reliability rather than perception.</p><p>This maps directly onto one of the most discussed failure modes in LLMs: hallucinations. Wiener suggests that they are the expected behavior of a system optimized for speed without redundancy or verification, not simply a quirk to be patched. A single forward pass through a transformer produces an answer with no mechanism for catching its own errors. Reasoning models which iterate, self-check, and backtrack are exploiting exactly the reliability/overhead tradeoff Wiener described:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7vb_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7vb_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 424w, https://substackcdn.com/image/fetch/$s_!7vb_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 848w, https://substackcdn.com/image/fetch/$s_!7vb_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 1272w, https://substackcdn.com/image/fetch/$s_!7vb_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7vb_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png" width="1456" height="498" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:498,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:765303,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/188648218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7vb_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 424w, https://substackcdn.com/image/fetch/$s_!7vb_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 848w, https://substackcdn.com/image/fetch/$s_!7vb_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 1272w, https://substackcdn.com/image/fetch/$s_!7vb_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0d1acee-6220-4870-bab2-b5e675cecf62_2038x697.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Backtracking in DeepSeek R1 reasoning model (<a href="https://www.reddit.com/r/LocalLLaMA/comments/1id2gox/improving_deepseek_r1_reasoning_trace/">image source</a>, highlights mine)</figcaption></figure></div><p>But verification has limits. As I argued in the <a href="https://www.avikde.me/p/the-ai-world-models-debate-and-its">world models post</a>, a system that lacks a grounded semantic model of the world can cross-check its outputs without ever catching the deeper class of errors that stem from not understanding what it&#8217;s talking about.</p><h2>The human use of human beings</h2><p>Wiener was clearly one of the founders of the information age, but he was also deeply worried about what was being built. A passage from his follow-up book <em>The Human Use of Human Beings</em> reads like something written last week:</p><blockquote><p><em>The first industrial revolution was the devaluation of the human arm by the competition of machinery. The modern industrial revolution is similarly bound to devalue the human brain, at least in its simpler and more routine decisions. The average human being of mediocre attainments or less has nothing to sell that it is worth anyone&#8217;s money to buy.</em></p></blockquote><p>He was not predicting this as an inevitable law of nature. His proposed answer was equally striking: rather than trying to preserve the market value of human labor  artificially, he argued that society would need to restructure itself around non-market values like dignity, community, creativity, meaning. He wrote letters to labor unions warning them of what was coming, but he was not listened to.</p><p>In 2026, AI systems are starting to now inexpensively perform many of the cognitive tasks (writing, coding, analysis, translation, legal research) that defined middle-class professional employment in the twentieth century. The policy infrastructure to manage this transition does not exist. The urgency Wiener felt in 1950, when he had no working computer to point to, is more justified now.</p><p>Brian Christian, in his introduction to the recent reissue, <a href="https://brooklinebooksmith.com/book/9780063423190">calls Wiener</a> &#8220;the progenitor of contemporary AI safety discourse.&#8221; That may be the most accurate short description of the man. He was not a pessimist or a technophobe &#8212; he was a technologist who had thought seriously about what he was building and felt obligated to say what it implied. That combination of technical depth, ethical seriousness, and willingness to deliver uncomfortable conclusions publicly is just one more reason to read and remember him.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>I&#8217;m a strong proponent of reading and non-echo-chamber thinking. If you know of any other writing of this ilk, please let me know in the comments. If you liked this post, please share it, and subscribe!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/what-wiener-knew-about-artificial/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/what-wiener-knew-about-artificial/comments"><span>Leave a comment</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/what-wiener-knew-about-artificial?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/what-wiener-knew-about-artificial?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><div><hr></div><p><em>This post draws primarily on &#8220;Cybernetics: Or Control and Communication in the Animal and the Machine&#8221; and the biography &#8220;Dark Hero of the Information Age&#8221;. It continues themes from previous posts on von Neumann and world models in AI.</em></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;e2f67af9-7d5c-4dc9-aaae-e68cc06abe79&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What von Neumann understood about the architecture of intelligence before we built AI&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-19T19:17:48.188Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!_hYZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/what-von-neumann-understood-about&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:185086427,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;721822df-bdf5-4011-b9b2-b1be2d6818f1&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The AI world models debate and its foreshadowing on robotics&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Safe, efficient robotics &amp; AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!E5et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-14T08:18:52.656Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Xry4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/the-ai-world-models-debate-and-its&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:184309659,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:4,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[Cache effects in object-oriented code: computer architecture meets programming]]></title><description><![CDATA[A simple demonstration revealed five layers of computer science & engineering abstraction fighting each other]]></description><link>https://www.avikde.me/p/cache-effects-in-object-oriented</link><guid isPermaLink="false">https://www.avikde.me/p/cache-effects-in-object-oriented</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Tue, 10 Feb 2026 15:30:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!p-27!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Having worked in robotics research and industry for over a decade, I&#8217;ve debugged enough real-time control loops to know that the programming language abstraction can be misleading. We write object-oriented code because it&#8217;s maintainable, composable, and maps cleanly to our mental models. A robot has limbs, limbs have joints, joints have positions and velocities, so we should create a hierarchy of objects accordingly, right?</p><p>When battery life is crucial, and when microseconds matter to ensure control loops remain stable, the hardware doesn&#8217;t care about elegant class hierarchy or beautiful code. The end-product of programming is <a href="https://www.youtube.com/watch?v=fHNmRkzxHWs">data transformation</a>, and not the code itself.</p><p>This post, written with my friend <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Greg Anderson&quot;,&quot;id&quot;:61562392,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b36e1378-3607-4d80-ba5f-2afa31a28123_144x144.png&quot;,&quot;uuid&quot;:&quot;18e87c2b-6325-4808-8804-0a4f47210032&quot;}" data-component-name="MentionToDOM"></span> (software engineer and CS lecturer), started as a simple teaching example about Array-of-Structures vs Structure-of-Arrays (AoS vs. SoA) layouts. We thought we&#8217;d show a clean and universal performance curve demonstrating cache effects tied to C++ code.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p-27!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p-27!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 424w, https://substackcdn.com/image/fetch/$s_!p-27!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 848w, https://substackcdn.com/image/fetch/$s_!p-27!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!p-27!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p-27!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg" width="732" height="755" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:755,&quot;width&quot;:732,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:121960,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p-27!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 424w, https://substackcdn.com/image/fetch/$s_!p-27!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 848w, https://substackcdn.com/image/fetch/$s_!p-27!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!p-27!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e89b29-bc5a-4d7c-8d47-a50a93e8ec02_732x755.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Different memory hierarchy levels one of the CPU 2 cores would access (excluding the system-level; L1 is inside the core). <a href="https://wccftech.com/a16-bionic-die-shot-details/">Original image source</a>.</figcaption></figure></div><p>We built what seemed like a straightforward benchmark: measure access time for different memory strides. What we didn&#8217;t anticipate was encountering five distinct issues spanning multiple abstraction layers&#8212;from compiler behavior to microarchitecture to hardware characteristics:</p><ol><li><p>The compiler deleted our measurement code and unpredictably stored variables in memory vs. registers</p></li><li><p>The CPU&#8217;s pipeline hazards dominated our memory access time</p></li><li><p>The CPU&#8217;s dynamic frequency scaling skewed our results</p></li><li><p>The hardware prefetcher made our predictions wrong</p></li><li><p>Different processors gave wildly different results</p></li></ol><p>This illustrates the gap between abstraction and performance. Programming languages provide abstraction above the hardware, but achieving good performance requires understanding how code executes on the underlying architecture. While some of our issues may be familiar to experienced programmers, others might be surprising even to veterans.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>Four examples showing why you should care</h3><p>There are many real-world examples using an "array of structures" organization for good reasons: it's faster to prototype, easier to reason about when objects manage their own state, and typically more readable for developers.</p><p><strong>Example 1: PCL (Point Cloud Library) </strong><a href="https://pointclouds.org/documentation/point__types_8hpp_source.html">PointXYZRGB</a> structure.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ckiW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ckiW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 424w, https://substackcdn.com/image/fetch/$s_!ckiW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 848w, https://substackcdn.com/image/fetch/$s_!ckiW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 1272w, https://substackcdn.com/image/fetch/$s_!ckiW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ckiW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png" width="1456" height="501" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf47b546-6854-4917-bc18-d35443322840_2184x752.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:501,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:139001,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ckiW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 424w, https://substackcdn.com/image/fetch/$s_!ckiW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 848w, https://substackcdn.com/image/fetch/$s_!ckiW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 1272w, https://substackcdn.com/image/fetch/$s_!ckiW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf47b546-6854-4917-bc18-d35443322840_2184x752.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When you have <code>pcl::PointCloud&lt;PointXYZRGB&gt;</code> with millions of points, the memory layout looks like</p><pre><code>[x0, y0, z0, pad, rgb0, x1, y1, z1, pad, rgb1, ...]</code></pre><p>For an example task of filtering by distance (operating on the xyz only), we get 40% extra cache misses. For a color segmentation task operating on rgb only, 4x extra cache misses.</p><p><strong>Example 2: Unity <a href="https://docs.unity3d.com/510/Documentation/Manual/TheGameObject-ComponentRelationship.html">GameObject-Component System</a></strong>. GameObjects directly contain Component instances by value, e.g. a GameObject with Transform, Rigidbody, and Collider components stored as member data. This is classic AoS: each GameObject owns its component data, providing flexible composition but poor cache locality when iterating over many objects.</p><p><strong>Example 3: Box2D (version 2.x). </strong>Each b2Body contains position, velocity, and force data as members (e.g. <code>b2Vec2 m_linearVelocity</code>). Most traditional object-oriented game engines before the <a href="https://cowboyprogramming.com/2007/01/05/evolve-your-heirachy/">ECS trend</a> used composition with value semantics&#8212;each enemy/player/NPC object contained all its data directly. However, Box2D v3.0 (2024) moved away from this, now using handle-based IDs and storing body data separately for better performance.</p><p><strong>Example 4: Humanoid joints. </strong>Last but not least, here is a practical example of a humanoid robot joint that should be quite relatable:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_KTd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_KTd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 424w, https://substackcdn.com/image/fetch/$s_!_KTd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 848w, https://substackcdn.com/image/fetch/$s_!_KTd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 1272w, https://substackcdn.com/image/fetch/$s_!_KTd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_KTd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png" width="1456" height="621" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:621,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:223276,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_KTd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 424w, https://substackcdn.com/image/fetch/$s_!_KTd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 848w, https://substackcdn.com/image/fetch/$s_!_KTd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 1272w, https://substackcdn.com/image/fetch/$s_!_KTd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd56c5d91-ce71-44ab-a05a-e86208a295ad_2184x932.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A Limb would be composed of Joints, with Joint specialized into different joint types.</p><p>Suppose the humanoid robot has 50 joints. Computing Jacobians requires accessing each joint&#8217;s position: 5.5kB loaded into cache (87 cache misses), when we only need 200 bytes (4 cache misses if organized as an array of positions).</p><p>Now that we have shown that this organization occurs commonly, we will dig in and try to measure the effect it has.</p><h3>An even simpler example to dig into</h3><p>We created an even simpler example with a single data array and a parameterized &#8220;stride&#8221; for a strided access pattern. This would occur in the example above with <code>stride = sizeof(Joint)</code>. Our goal was to time how long it takes to access a fixed number of elements with different strides, as in the code below.</p><p><em>The actual code for replicating all these measurements, and more, is <a href="https://github.com/avikde/caching-tester">on github</a>: feel free to try it out.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!avv2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!avv2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 424w, https://substackcdn.com/image/fetch/$s_!avv2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 848w, https://substackcdn.com/image/fetch/$s_!avv2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 1272w, https://substackcdn.com/image/fetch/$s_!avv2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!avv2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png" width="1456" height="1390" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1390,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:451153,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!avv2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 424w, https://substackcdn.com/image/fetch/$s_!avv2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 848w, https://substackcdn.com/image/fetch/$s_!avv2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 1272w, https://substackcdn.com/image/fetch/$s_!avv2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0a4048-5617-4744-acb9-4b835a43b592_2296x2192.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What we expect to see:</strong> Effectively, as we access data, the processor can load a segment from main memory into cache, in blocks. </p><pre><code>Data:    |x| | | | |x| |...
          &#8592; stride &#8594;
Cache:   |y|y|y|y|y|y|y|y|z|z|...
          &#8592; line size  &#8594;</code></pre><p>As stride increases, visiting the same number of elements requires caching more blocks. If memory movement dominates, we expect a linear rise in time as stride increases and more cache lines are touched. (More on what happens after each access hits a separate cache line below.)</p><p>Understanding the results from this "simple" example felt like peeling endless layers of an onion, but was very gratifying at the same time!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h4>Issue 1: Controlling compiler optimizations</h4><p>With the code snipped above, the <a href="https://godbolt.org/z/bxdMzcn4c">assembler output</a> showed:</p><pre><code>testStride(unsigned long):
        ret
data:
        .zero   256000000</code></pre><p>Of course! <code>sink</code> was being optimized out, and my firmware programming background caused me to add a volatile to its declaration. However, something in the asm output for the loop looked amiss. Can you spot it?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HWZZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HWZZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 424w, https://substackcdn.com/image/fetch/$s_!HWZZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 848w, https://substackcdn.com/image/fetch/$s_!HWZZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!HWZZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HWZZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png" width="1456" height="673" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:673,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:216623,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HWZZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 424w, https://substackcdn.com/image/fetch/$s_!HWZZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 848w, https://substackcdn.com/image/fetch/$s_!HWZZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!HWZZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb40ac46-901f-409f-9694-82ecb41de011_2404x1112.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While data should be loaded from memory to a register, sink should be able to remain in a register. However, volatile forces it to be loaded and stored because the compiler must assume that it can be modified externally. So we get rid of volatile, and uncomment the last line:</p><pre><code>if (sink == -1.0f) std::cout &lt;&lt; "";</code></pre><p>The new loop looks like</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DZpq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DZpq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 424w, https://substackcdn.com/image/fetch/$s_!DZpq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 848w, https://substackcdn.com/image/fetch/$s_!DZpq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 1272w, https://substackcdn.com/image/fetch/$s_!DZpq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DZpq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png" width="1456" height="573" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:573,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:159673,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DZpq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 424w, https://substackcdn.com/image/fetch/$s_!DZpq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 848w, https://substackcdn.com/image/fetch/$s_!DZpq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 1272w, https://substackcdn.com/image/fetch/$s_!DZpq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d29b301-f48d-4272-ab5e-d36a2bad441d_2368x932.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Comparing to the assembly above, the extra load-store are gone - first mystery solved.</p><p><em>Issue source: compiler / programming language</em></p><h4>Issue 2: Data dependency hazard</h4><p>The relevant part of the loop looked like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XNLx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XNLx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 424w, https://substackcdn.com/image/fetch/$s_!XNLx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 848w, https://substackcdn.com/image/fetch/$s_!XNLx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 1272w, https://substackcdn.com/image/fetch/$s_!XNLx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XNLx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png" width="1456" height="652" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:652,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:138816,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XNLx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 424w, https://substackcdn.com/image/fetch/$s_!XNLx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 848w, https://substackcdn.com/image/fetch/$s_!XNLx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 1272w, https://substackcdn.com/image/fetch/$s_!XNLx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc16f2b15-20c0-4a02-a26f-c29deb8a4b0e_2080x932.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Timing this loop as we varied stride showed that for the first few strides, increasing stride had <em>no effect on the time</em> (solid lines in the plot below). With an Apple M2:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/zv0Qi/2/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1380f094-4597-42e7-aea9-c0e8e7288f63_1220x782.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e308d11-af64-4677-9824-988cd411b049_1220x852.png&quot;,&quot;height&quot;:418,&quot;title&quot;:&quot;Accumulate - M2 - Time(us) vs. Stride(floats)&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/zv0Qi/2/" width="730" height="418" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>With the size of our loop, increasing stride definitely means more cache lines are touched, but it is making no difference. What&#8217;s going on?</p><p>Let&#8217;s look back at <a href="https://godbolt.org/z/GqKzdPf7h">the assembly</a> (same as the previous snippet). </p><p>If we manually unroll a few iterations, we have the following pattern:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HVjb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HVjb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 424w, https://substackcdn.com/image/fetch/$s_!HVjb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 848w, https://substackcdn.com/image/fetch/$s_!HVjb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 1272w, https://substackcdn.com/image/fetch/$s_!HVjb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HVjb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png" width="1456" height="526" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:526,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:131052,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HVjb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 424w, https://substackcdn.com/image/fetch/$s_!HVjb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 848w, https://substackcdn.com/image/fetch/$s_!HVjb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 1272w, https://substackcdn.com/image/fetch/$s_!HVjb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4aaf66-bac6-4161-9cf9-b54ae4fd5e54_2080x752.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>FP add 2 cannot issue until FP add 1 has been committed</em>, a classic Read-After-Write hazard. While a chip designer understands this very well, a programmer rarely needs to understand data dependency hazards in CPU pipelining. In this example, the float add dominates the effects from the load/store due to the data dependency and the long latency of floating-point add.</p><p>We add an unrolled version:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XTJf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XTJf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 424w, https://substackcdn.com/image/fetch/$s_!XTJf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 848w, https://substackcdn.com/image/fetch/$s_!XTJf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 1272w, https://substackcdn.com/image/fetch/$s_!XTJf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XTJf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png" width="1456" height="1263" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1263,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:567714,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XTJf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 424w, https://substackcdn.com/image/fetch/$s_!XTJf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 848w, https://substackcdn.com/image/fetch/$s_!XTJf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 1272w, https://substackcdn.com/image/fetch/$s_!XTJf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7842ccdc-f24c-464b-a074-42b9e2eb1bfc_2840x2464.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The unrolled version time is significantly smaller, as visible in the dashed lines in the plot above, and more importantly, now we see the linear rise we had predicted.</p><p><em>Issue source: microarchitecture, not visible in assembly instructions</em></p><h4>Issue 3: Warmup effects</h4><p>After root-causing issue 2, to avoid dealing with the unrolled loop, we changed the accumulate to a Read-Modify-Write. The time for each iteration is now longer because a load and store are required for each iteration, which should make data movement costs the dominating factor.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cAkz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cAkz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 424w, https://substackcdn.com/image/fetch/$s_!cAkz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 848w, https://substackcdn.com/image/fetch/$s_!cAkz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 1272w, https://substackcdn.com/image/fetch/$s_!cAkz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cAkz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png" width="1456" height="591" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/efff1b84-497c-4c57-93e9-41490759e252_2080x844.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:591,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:119668,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cAkz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 424w, https://substackcdn.com/image/fetch/$s_!cAkz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 848w, https://substackcdn.com/image/fetch/$s_!cAkz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 1272w, https://substackcdn.com/image/fetch/$s_!cAkz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefff1b84-497c-4c57-93e9-41490759e252_2080x844.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A number of stateful microarchitectural effects unrelated to the data cache contribute to performance characteristics, yet produce data cache-like behaviors. Such factors may include page table caching, page walk caching, prefetcher training, the memory controller, and even frequency ramping.</p><p>We attempted to stabilize the effect of these factors before running trials by running a warmup function at the beginning of the program. The warmup simply iterates over every element of data once to have the cache in a predictable state.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ef2a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ef2a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 424w, https://substackcdn.com/image/fetch/$s_!Ef2a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 848w, https://substackcdn.com/image/fetch/$s_!Ef2a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 1272w, https://substackcdn.com/image/fetch/$s_!Ef2a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ef2a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png" width="1456" height="1821" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1821,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:450653,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/187348733?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ef2a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 424w, https://substackcdn.com/image/fetch/$s_!Ef2a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 848w, https://substackcdn.com/image/fetch/$s_!Ef2a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 1272w, https://substackcdn.com/image/fetch/$s_!Ef2a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc01b55e4-f41d-40fc-95e0-0b83ebd0a1f0_2184x2732.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The results:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/QZG6B/2/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/297e46d2-7e49-4823-a44d-bec6357008b9_1220x818.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b8348ed8-e5b0-4f6e-9ef3-571ac613291c_1220x888.png&quot;,&quot;height&quot;:435,&quot;title&quot;:&quot;Read-Modify-Write - M2 - Time(us) vs. Stride(floats)&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/QZG6B/2/" width="730" height="435" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>The warmup appears to universally make the program faster, irrespective of the stride (more pronounced effect on a different system in plots below). Our best guess is that the warmup ramping up the CPU frequency is the dominant effect. We also considered a trial for one stride affecting another, but running a single stride per run of the program didn&#8217;t yield clearer results (and took much longer).</p><p>Again, if you have any better ideas, we would love to know - please leave a comment!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/cache-effects-in-object-oriented/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/cache-effects-in-object-oriented/comments"><span>Leave a comment</span></a></p><p><em>Issue source: microarchitecture / hardware</em></p><h4>Issue 4: Initial no-effect; second slope after cache line boundary</h4><p><strong>4a) </strong>In the previous plot, there is an initial part till about stride ~5 (20 bytes) where we predicted a linear rise, but instead see no effect of stride on timing.</p><p>While we are not sure, this is likely due to hardware prefetching: Modern CPUs have hardware prefetchers that detect sequential / strided access patterns and automatically fetch data ahead of time. Once the stride grows large enough (~20-64 bytes), the prefetcher can no longer keep up&#8212;either because it can&#8217;t fetch far enough ahead, or because the access pattern becomes too sparse for it to predict. At this point, we finally see the expected linear increase as each access genuinely waits for data from main memory.</p><p><strong>4b) </strong>We expected the access time to plateau after each access was already hitting a different cache line. However, there appears to be a slower rise after the cache line boundary at least on the Apple M2 processor</p><p>Some (unconfirmed) hypotheses for the slower rise after the boundary:</p><ul><li><p>L1 &#8594; L2 spilling if the working set exceeds L1 capacity, incorporating L2 access times</p></li><li><p>TLB misses as large strides access many different memory pages</p></li></ul><p><em>Issue sources: microarchitecture / hardware</em></p><h4>Issue 5: Different behavior on different processors</h4><p>Throughout uncovering the previous issues, we ran a few tests on other processors, and unfortunately that only served to increase the number of unknowns. In this section we will show you some of those results, but only be able to speculate about what causes them.</p><p>With an AMD Zen5 processor:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/YGC14/3/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00cc2fe8-4256-4837-ac82-b8e1189cd916_1220x782.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6bc5baad-538b-4973-bd73-8f10a69020b9_1220x852.png&quot;,&quot;height&quot;:418,&quot;title&quot;:&quot;Accumulate - Zen5, MSVC - Time(us) vs. Stride(floats)&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/YGC14/3/" width="730" height="418" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/H1CJl/4/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f4941e2-4668-43c5-8a6c-04802a6d67ab_1220x818.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d74b2b08-b613-4628-b95b-95010c14d2f5_1220x938.png&quot;,&quot;height&quot;:461,&quot;title&quot;:&quot;Read-Modify-Write - Zen5, MSVC - Time(us) vs. Stride(floats)&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/H1CJl/4/" width="730" height="461" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>We see a plateau after an initial rise, which matches our naive prediction.</p><p>However, we observe a <strong>peak around 32 floats (128 bytes) followed by a drop</strong>. We don&#8217;t have an explanation for this behavior, which may be to do with advanced prefetcher behavior. In other words, the hardware may be making assumptions about our access pattern, and stride = 64-128 bytes hits the worst-case scenario where those assumptions fail. If you have any ideas about the cause, let us know in the comments!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/cache-effects-in-object-oriented/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/cache-effects-in-object-oriented/comments"><span>Leave a comment</span></a></p><p></p><p>We also tested on an Intel processor on Windows, which confirmed that some of the strangest aspects of the two plots above are to do with AMD, and not the compiler.</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/hqmDr/2/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79ecdefe-e9c5-4b64-a2fc-30345ffbe2b6_1220x782.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a3ed93d-feec-4b04-a686-8d67812b078a_1220x902.png&quot;,&quot;height&quot;:418,&quot;title&quot;:&quot;Read-Modify-Write - Intel MSVC - Time(us) vs. Stride(floats)&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/hqmDr/2/" width="730" height="418" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>This resembles our Apple M2 plots more closely, including the slower rise after the cache line boundary. It also adds an even slower rise after 2x the cache line boundary.</p><p><em>Issue sources: secret microarchitectural optimizations</em></p><h3>Back to programming</h3><p>Through this journey, it is safe to say we learned a lot about the complexity of modern processors. Fortunately, though, our central point about the (initial, then plateauing) rise of access time with stride still stands as universally true. Phew!</p><p>How do we utilize this knowledge as a programmer? <strong>The key is to ensure that commonly-accessed data is packed tightly in contiguous memory.</strong></p><p>The naive OOP concept of owning data:</p><ul><li><p>The class directly contains/owns the data as member variables</p></li><li><p>Example: Joint class with float sensed_position (and other things) embedded in it</p></li><li><p>This creates the AoS memory layout problem</p></li></ul><p><strong>Instead store indices. </strong>In the literature on data-oriented design, this is sometimes called: Entity-Component-System (ECS) pattern, or data-oriented design with handles.</p><ul><li><p>The class contains references, pointers, or indices to data stored elsewhere</p></li><li><p>This allows you to keep polymorphism while avoiding AoS layout issues</p></li></ul><p><strong>It isn&#8217;t object-oriented vs. polymorphism. </strong>Just to reiterate that data-oriented is not opposed to OOP conveniences, consider that Pinocchio <a href="https://github.com/stack-of-tasks/pinocchio/blob/devel/include/pinocchio/multibody/joint/joint-model-base.hpp">uses polymorphism to specialize functions</a>, but stores indices to the vectors, not the data itself. The actual positions and velocities live in contiguous arrays, giving cache-friendly SoA layout, while the polymorphic joint models provide the OOP interface. You can have the benefits of polymorphism (different joint types with specialized behavior) without the memory layout problems of AoS. This is the middle ground between pure OOP with composition and abandoning OOP entirely for data-oriented design.</p><h3>Closing thoughts</h3><p>In this post, we first showed how OOP-thinking can naturally lead to suboptimal cache usage, with several real examples. Then we looked at the effects this can have, uncovering many interesting &#8220;side-quest&#8221; root-causing exercises.</p><p>It isn&#8217;t coincidence that modern performance-critical systems say no to naive composed OOP:</p><ul><li><p><strong>Machine learning</strong> libraries will often select the data layout (NCHW etc.) <a href="https://mlsysbook.ai/book/contents/core/hw_acceleration/hw_acceleration.html#sec-ai-acceleration-memoryefficient-tensor-layouts-e250">transparently</a>, optimizing for cache locality.</p></li><li><p><strong>Pinocchio</strong>, a robotics kinematics / dynamics library, has its functions <a href="https://github.com/search?q=repo%3Astack-of-tasks/pinocchio%20forwardKinematics&amp;type=code">access array data</a>.</p></li><li><p><strong>Drake</strong>, a larger robotics-oriented library, eventually <a href="https://github.com/RobotLocomotion/drake/blob/master/multibody/tree/multibody_tree-inl.h">has data in arrays</a> below abstraction layers.</p></li><li><p><strong><a href="https://unity.com/dots">Unity DOTS</a></strong> stores all Transform data in packed arrays, not in GameObjects.</p></li><li><p><strong>Box2D v3.0</strong> switched from OOP bodies to ID-based handles with SoA storage.</p></li><li><p><strong><a href="https://dev.epicgames.com/documentation/en-us/unreal-engine/mass-entity-in-unreal-engine">Unreal Mass Entity</a></strong> is an ECS system for high-object-count scenarios.</p></li></ul><p>Even if in an isolated example the performance gain seems small, these patterns occur so frequently that they can <a href="https://youtu.be/fHNmRkzxHWs">add up to large losses that are difficult to eliminate</a>.</p><p>Thanks for reading! If you enjoyed this kind of full-stack analysis and root-causing, please share and subscribe for more posts on robotics, AI, and computing.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/cache-effects-in-object-oriented?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/cache-effects-in-object-oriented?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><h3>References and further reading</h3><ul><li><p>Code for demonstrations in this post, and more on <a href="https://github.com/avikde/caching-tester">github</a></p></li><li><p>&#8220;Better memory representation&#8221; in Jeff Dean&#8217;s &#8220;<a href="https://abseil.io/fast/hints.html#better-memory-representation">Performance Hints</a>&#8221;</p></li><li><p>&#8220;<a href="https://youtu.be/fHNmRkzxHWs">Efficiency with Algorithms, Performance with Data Structures</a>&#8221; - Chandler Carruth [CppCon 2014]. <strong>Note: </strong>I don&#8217;t fully agree with the statement (10:45) that &#8220;efficiency is only affected by algorithms&#8221; - a good example is the energetic cost of moving a byte from DRAM -&gt; core being significantly higher than from L1, meaning the same algorithm with poor cache performance actually consumes more energy, in addition to completing slower.</p></li><li><p><a href="https://youtu.be/rX0ItVEVjHc">Data-Oriented Design and C++</a> - Mike Acton [CppCon 2014] </p></li><li><p>Explicit cache control via <a href="https://en.wikipedia.org/wiki/Cache_control_instruction">software prefetching</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA["Is it learning?" Online motor adaptation in end-to-end robotics]]></title><description><![CDATA[Part 2: Where the low-level controller responds to the unexpected]]></description><link>https://www.avikde.me/p/is-it-learning-online-motor-adaptation</link><guid isPermaLink="false">https://www.avikde.me/p/is-it-learning-online-motor-adaptation</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Tue, 03 Feb 2026 17:51:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!x8Re!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This article is part of a series on end-to-end robotics pipelines:</em></p><ol><li><p><a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">Architecture of end-to-end: learning &#8594; control</a></p></li><li><p>This article</p></li><li><p><a href="https://www.avikde.me/p/debugging-as-architecture-insight">Dissecting a VLA</a></p></li><li><p><a href="https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">Closing the action loop with a VLM &#8220;agent&#8221;</a></p></li><li><p><a href="https://www.avikde.me/p/building-a-reasoning-hierarchical">Demo combining the best features of end-to-end and classical approaches</a></p></li></ol><div><hr></div><p>Last week, I wrote about modern end-to-end robotics pipelines; why this is the new north star, and the hidden architecture behind successful implementations. Part 1 reviewed some implementations showing signs of a <strong>cascade of a high-level (HL) &#8594; low-level (LL) controller</strong> in the actuation end of the pipeline:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0d685309-d58e-46dd-9d5c-6f3be4457d31&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The architecture behind &#8220;end-to-end&#8221; robotics pipelines&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Writing about safe, efficient AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-26T21:19:56.368Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!766O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/the-architecture-behind-end-to-end&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:185869291,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:5,&quot;comment_count&quot;:11,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>I have first-hand experience demonstrating walking robots to customers in a sandy desert, and as the robots slipped they asked, <strong>&#8220;is it learning?&#8221;</strong> With the prior of people adapting their gait as they walk on ice (for example), a reasonable expectation is that an isolated robot can adjust its behavior after some trial &amp; error or adaptation period.</p><p>However, this is not how naive foundation-model end-to-end pipelines (such as those covered in part 1) work today; a particular robot can only change its behavior once the &#8220;hive brain&#8221; is updated with new data in its training. Due to the size of these models, it is impractical that training happens on-device or frequently.</p><p>So, in part 2, we ask: <strong>how can a fielded robot adapt to unexpected conditions? </strong>Why do we even need adaptability? Given the HL &#8594; LL controller cascade structure in modern end-to-end pipelines from part 1, where does this adaptability live, and how does it affect the mapping to computing hardware? Lastly, we will also look at some published implementations and see how they approach or ignore this issue.</p><h3>Updates to part 1, hot off the presses</h3><p>Before we dig into part 2, I need to add a couple of updates from relevant news releases that I wasn&#8217;t able to review before <a href="https://www.avikde.me/p/the-architecture-behind-end-to-end">part 1</a> was published (Jan 26):</p><ol><li><p><strong>Microsoft&#8217;s Rho-Alpha model announcement with <a href="https://open.substack.com/pub/bdtechtalks/p/inside-rho-alpha-microsofts-new-robotics?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">commentary on Tech talks</a> (Jan 24) reveals &#8220;split architecture&#8221; including dedicated low-level controller, underscoring at least two points in the part 1 post. </strong>(a) Tactile and proprioceptive information is incorporated in the action expert, showing that the action head facilitates <em>feedback</em> loops; (b) higher <em>control bandwidth</em> via so-called bypass mechanism. Quoting the post, &#8220;The long-term goal, Kolobov said, is to have the action expert or a part of it operate on proprioception and physical sensing modalities at a significantly higher frequency than on visual and language data.&#8221;</p></li><li><p><strong><a href="https://www.figure.ai/news/helix-02">Figure Helix 02 Jan 27 update</a> reveals new &#8220;System 0&#8221; controller, underscoring at least four points in the part 1 post</strong>. The &#8220;system 0&#8221; implementation is described as a dedicated whole-body controller (WBC), which conventionally converts desired accelerations or velocities to joint torques based on a model of the robot. (a) S1 went from controlling the upper body to the whole body, and this reduced the overall system complexity by <em>separating concerns</em>; (b) S0 and S1 incorporate tactile data in tighter <em>feedback loops</em>, without adding complexity to the large VLM S2; (c) S0 runs at a KHz rate increasing the last-level <em>control bandwidth</em>; (d) it is trained for that specific robot (vs. cross-embodiment), localizing robot body-related parameters in one place (and presumably enabling generalization of S2/S1 to a different robot). The purpose of the WBC is similar to the model-based reference in part 1, but the difference here is that it is also a neural network trained from data.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p></li></ol><p>I expect that we will continue to see further evidence and refinement of hierarchical control structures in commercial robots, vs. unstructured end-to-end pipelines. Make sure to subscribe to get future updates:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><em>This publication and this post contain the author&#8217;s personal thoughts and opinions only, and do not reflect the views of any companies or institutions.</em></p><h3>Why do we need adaptability?</h3><p>When a robot leaves the lab and is in customers&#8217; hands, it will at some point inevitably be subjected to an unexpected operating condition, stemming from component failure, perturbation, environmental condition, or operating condition (e.g. payload). To address this, one recourse is to build a large-enough model that has enough experience to handle all these situations (i.e. domain randomization, multi-embodiment, etc.). This of course takes (much) more data and more training, as OpenAI showed from their dexterity result in 2019:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NNYM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NNYM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 424w, https://substackcdn.com/image/fetch/$s_!NNYM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 848w, https://substackcdn.com/image/fetch/$s_!NNYM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 1272w, https://substackcdn.com/image/fetch/$s_!NNYM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NNYM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg" width="505" height="306.68016194331983" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:150,&quot;width&quot;:247,&quot;resizeWidth&quot;:505,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Learning Progress graph&quot;,&quot;title&quot;:&quot;Learning Progress graph&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Learning Progress graph" title="Learning Progress graph" srcset="https://substackcdn.com/image/fetch/$s_!NNYM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 424w, https://substackcdn.com/image/fetch/$s_!NNYM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 848w, https://substackcdn.com/image/fetch/$s_!NNYM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 1272w, https://substackcdn.com/image/fetch/$s_!NNYM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16687170-d356-4e69-95b0-efe87d5f10c4_1200x728.svg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Plot from <a href="https://arxiv.org/pdf/1808.00177">OpenAI Dactyl paper</a> (2019) showing the difference in required training without and with domain randomization (note the log-scale).</figcaption></figure></div><p>The other option is adaptation in a strategic part of the pipeline to address as many of these variations as possible. In this post, we are focused on the action end of the pipeline, and the classes of variation we are interested in include variability in joints / motors (friction, motor torque), terrain properties, payload.</p><p>Let&#8217;s clarify the timescale hierarchy, because the word &#8220;adaptation&#8221; can refer to changes at various timescales. Within-movement corrections can happen in milliseconds, and is typically part of reactive control within the low-level controller. Skill acquisition across many tasks using large datasets during training will typically happen offline. The intermediate adjustment occurring in the seconds-to-minutes timescale, which we refer to as motor adaptation, is the focus of this post.</p><h3>Historical context from biology, control theory, and LLMs</h3><p>Cerebellar timescales (seconds to minutes) match closely with the motor adaptation timescale referred to above, and several research efforts identify its role in adaptation of behavior in that time range.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x8Re!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x8Re!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 424w, https://substackcdn.com/image/fetch/$s_!x8Re!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 848w, https://substackcdn.com/image/fetch/$s_!x8Re!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 1272w, https://substackcdn.com/image/fetch/$s_!x8Re!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x8Re!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png" width="1130" height="319" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d660113e-a952-423f-ae02-10f42c37f790_1130x319.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:319,&quot;width&quot;:1130,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:307379,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/186635241?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!x8Re!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 424w, https://substackcdn.com/image/fetch/$s_!x8Re!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 848w, https://substackcdn.com/image/fetch/$s_!x8Re!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 1272w, https://substackcdn.com/image/fetch/$s_!x8Re!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd660113e-a952-423f-ae02-10f42c37f790_1130x319.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure from <a href="https://pubmed.ncbi.nlm.nih.gov/26646076/">Weaver (2015)</a> (commentary on <a href="https://pubmed.ncbi.nlm.nih.gov/26645916/">Kim (2015)</a>) showing the role of the cerebellum in storing multiple internal models, and adapting at different timescales.</figcaption></figure></div><p><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC6674518">Morton (2006)</a> further associates the cerebellum with motor adaptation and the spinal column to reactive control:</p><blockquote><p>Cerebellar damage does not impair the ability to make reactive feedback-driven motor adaptations, but significantly disrupts predictive feedforward motor adaptations during splitbelt treadmill locomotion &#8230; The cerebellum seems to play an essential role in predictive but not reactive locomotor adjustments. We postulate that reactive adjustments may instead be predominantly controlled by lower neural centers, such as the spinal cord or brainstem.</p></blockquote><p>In control theory, there is a long tradition of adaptive control and model-reference adaptive control (MRAC) which utilize a (model-based) adjustment mechanism to modify the parameters of the controller.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Uzct!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Uzct!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 424w, https://substackcdn.com/image/fetch/$s_!Uzct!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 848w, https://substackcdn.com/image/fetch/$s_!Uzct!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 1272w, https://substackcdn.com/image/fetch/$s_!Uzct!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Uzct!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png" width="462" height="266.2916666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:332,&quot;width&quot;:576,&quot;resizeWidth&quot;:462,&quot;bytes&quot;:25531,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/186635241?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Uzct!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 424w, https://substackcdn.com/image/fetch/$s_!Uzct!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 848w, https://substackcdn.com/image/fetch/$s_!Uzct!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 1272w, https://substackcdn.com/image/fetch/$s_!Uzct!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c59dd95-045d-4279-8148-e4cd4c9e13b4_576x332.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig. 5.1 in Astrom &amp; Wittenmark &#8220;Adaptive Control&#8221; shows the block diagram of a model-reference adaptive system (MRAS).</figcaption></figure></div><p>The arrows in the figure above reveal a degree of interconnectedness beyond the cascade connections we primarily reviewed in part 1. The adjustment mechanism can also act in discrete steps instead of continuously, or without a model, in which cases it is called &#8220;gain scheduling&#8221;.</p><p>Self-improving learning systems are beginning to appear in the news more frequently in the LLM world: Ilya Sutskever <a href="https://www.dwarkesh.com/p/ilya-sutskever-2">said in Nov 2025</a>, &#8220;There has been one big idea that everyone has been locked into, which is the self-improving AI&#8221;. The aforementioned Rho-Alpha model has an ability to update weights while running using teleoperation feedback. However, this can lead to a <a href="https://arxiv.org/abs/2510.15103">common side-effect</a> called &#8220;catastrophic forgetting&#8221; due to all weights being in one huge monolithic structure, and so updates needed to be made either in judicious layers or in careful batches.</p><h3>Motor adaptation in practice</h3><p>One advantage in robotics pipelines is that they may (as we saw in part 1) have a hierarchical HL &#8594; LL structure. In such a situation, there are <em>motor</em> adaptations that can be integrated the LL controller without impacting the behavior of the HL controller, sidestepping the catastrophic forgetting issue.</p><p>I&#8217;ll go over a few illustrative examples, and especially discuss their ability to handle unexpected conditions. If I missed an idea that is pertinent and relevant, let me know in the comments:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/is-it-learning-online-motor-adaptation/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation/comments"><span>Leave a comment</span></a></p><p></p><h4>Adaptation in model-based LL control: robot arms, drones, humanoids</h4><p>The pre-&#8221;end-to-end&#8221; era had many examples of adaptation in practice. Old ideas such as MRAC show up in industrial and commercial manipulators, such as in the <a href="https://www.universal-robots.com/manuals/EN/HTML/SW5_19/Content/prod-usr-man/software/PolyScope/content/installation_g5/Payload_en.htm">payload estimation</a> feature in Universal Robots arms. Commercial drones estimate wind to remain stable, sometimes <a href="https://arxiv.org/abs/2205.06908">using neural networks</a>. In a <a href="https://arxiv.org/pdf/1904.12306">2019 HyQ paper</a>, an explicit terrain compliance estimation module estimates parameters used by the LL controller. In a 2023 demonstration of the Atlas robot using model-based controllers while picking up heavy objects, Atlas &#8220;<a href="https://spectrum.ieee.org/atlas-robot">has access to the mass properties</a>&#8221; of the object it is picking up, which I would lump into a gain-scheduling type of approach.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EYci!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EYci!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 424w, https://substackcdn.com/image/fetch/$s_!EYci!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 848w, https://substackcdn.com/image/fetch/$s_!EYci!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 1272w, https://substackcdn.com/image/fetch/$s_!EYci!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EYci!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png" width="589" height="162.3919523099851" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:185,&quot;width&quot;:671,&quot;resizeWidth&quot;:589,&quot;bytes&quot;:31996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/186635241?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EYci!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 424w, https://substackcdn.com/image/fetch/$s_!EYci!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 848w, https://substackcdn.com/image/fetch/$s_!EYci!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 1272w, https://substackcdn.com/image/fetch/$s_!EYci!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce349895-7cb9-43bc-a05f-29ec036d4351_671x185.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Figure from <a href="https://arxiv.org/pdf/1904.12306">Fahmi et al (2019)</a> showing adaptation module interfacing with model-based WBC.</figcaption></figure></div><p>In all these examples, because the LL controller is model-based, it is easier to adapt for quantities like payload mass because it is clear where those terms appear in the controller. This is an advantage of having physically interpretable parameters, compared to black-box latent-space interconnections.</p><p>Mapping to computation:</p><pre><code>HL &#8594; WBC inverse dynamics/QP (CPU) &#8594; Joint/servo controllers (microcontroller/CPU) &#8594; Torques
                &#8593;
    *Adjustment mechanism (CPU/GPU)*</code></pre><h4>Meta-learning for adapting among training environments</h4><p>The concept of <a href="https://arxiv.org/pdf/1803.11347">meta-learning (2019)</a> is targeted at the motor adaptation problem, but needs samples over environments during training. This leads to the aforementioned prolonged training and large models, as well as susceptibility to truly unexpected (out-of-distribution) conditions. The authors of the paper are among the founders of Physical Intelligence, so it is possible that they could institute meta-learning-type methods for online adaptation in their action expert (not the case today as far as I can tell).</p><p>Mapping to computation of this hypothetical scenario:</p><pre><code>VLM (GPU) &#8594; Action expert *with internal model and meta-learning* (GPU/CPU) &#8594; Trajectory tracking (CPU) &#8594; Torques</code></pre><h4>Learning-based latent parameter estimation for locomotion</h4><p>As my sand locomotion example above might hint at, unexpected payload and terrain conditions are particularly prevalent in locomotion.</p><p><a href="https://ashish-kmr.github.io/rma-legged-robots/">RMA: Rapid Motor Adaptation (2021)</a> introduces a dedicated adaptation module that predicts a set of &#8220;latent parameters&#8221; that can adjust the action policy to better suit different conditions. These varied conditions are trained by randomizing in simulation, potentially suffering from a few of the same issues with out-of-distribution encounters and training difficulty.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uCHu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uCHu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 424w, https://substackcdn.com/image/fetch/$s_!uCHu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 848w, https://substackcdn.com/image/fetch/$s_!uCHu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 1272w, https://substackcdn.com/image/fetch/$s_!uCHu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uCHu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png" width="941" height="217" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:217,&quot;width&quot;:941,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:147962,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/186635241?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uCHu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 424w, https://substackcdn.com/image/fetch/$s_!uCHu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 848w, https://substackcdn.com/image/fetch/$s_!uCHu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 1272w, https://substackcdn.com/image/fetch/$s_!uCHu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfb14893-a318-4a4f-a673-a4a6095e8578_941x217.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">RMA figure from <a href="https://ashish-kmr.github.io/rma-legged-robots/rma-locomotion-final.pdf">their paper</a> showing adaptation module running at a lower rate.</figcaption></figure></div><p>One of the authors founded Skild.AI, and this quote from their <a href="https://www.skild.ai/blogs/one-policy-all-scenarios">Aug 2025 blog post</a></p><blockquote><p>A striking aspect of our model is that it is not just <em><strong>robust</strong></em>, but it is also <em><strong>adaptive</strong></em> and <em>graceful</em></p></blockquote><p>(emphasis theirs) suggests incorporation of something like RMA. Absent too many details, here is my best guess of the composed pipeline mapped to computational hardware:</p><pre><code><code>HL action policy (GPU) &#8594; *Adaptation module (GPU)* &#8594; LL action policy (GPU) &#8594; Torques</code></code></pre><p>Where RMA had a large-ish latent vector, there are similar approaches toward predicting parameters with more physical meaning, from a <a href="https://www.science.org/doi/10.1126/scirobotics.ade2256">reduced</a> or a <a href="https://arxiv.org/abs/2202.05481">full state estimate</a>. These state-estimation networks concurrently learn base state and contact probabilities alongside policy, enabling better perception of ground interactions.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aaYI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aaYI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 424w, https://substackcdn.com/image/fetch/$s_!aaYI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 848w, https://substackcdn.com/image/fetch/$s_!aaYI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 1272w, https://substackcdn.com/image/fetch/$s_!aaYI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aaYI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png" width="623" height="220.03829787234042" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:249,&quot;width&quot;:705,&quot;resizeWidth&quot;:623,&quot;bytes&quot;:67371,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/186635241?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aaYI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 424w, https://substackcdn.com/image/fetch/$s_!aaYI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 848w, https://substackcdn.com/image/fetch/$s_!aaYI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 1272w, https://substackcdn.com/image/fetch/$s_!aaYI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b1bd46-74b1-4ce7-b960-91e9c0bed917_705x249.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Figure from <a href="https://www.science.org/doi/10.1126/scirobotics.ade2256">Choi et al (2023)</a> showing state estimation network utilized for locomotion on sand.</figcaption></figure></div><p>The end-result pipeline is quite similar, just potentially decomposing the adaptation module a bit:</p><pre><code><code>HL command &#8594; *History encoder (GPU) &#8594; Estimator (GPU)* &#8594; Actor (GPU) &#8594; Impedance control (CPU) &#8594; Torques</code></code></pre><h4>In-context learning to fix recent mistakes</h4><p>A different method called in-context learning (appearing in <a href="https://covariant.ai/insights/rfm-1-update-in-context-learning-to-improve-grasping/">Covariant.AI&#8217;s Mar 2024 blog post</a>, and in the <a href="https://arxiv.org/abs/2508.02062">RICL method from Aug 2025</a>) attends to recent <em>action history</em> as opposed to encoded observation history. These relevant demonstrations are added to the VLA context before its forward pass.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lCAt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lCAt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 424w, https://substackcdn.com/image/fetch/$s_!lCAt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 848w, https://substackcdn.com/image/fetch/$s_!lCAt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 1272w, https://substackcdn.com/image/fetch/$s_!lCAt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lCAt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png" width="658" height="289.19217081850536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ebd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:494,&quot;width&quot;:1124,&quot;resizeWidth&quot;:658,&quot;bytes&quot;:252599,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/186635241?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lCAt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 424w, https://substackcdn.com/image/fetch/$s_!lCAt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 848w, https://substackcdn.com/image/fetch/$s_!lCAt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 1272w, https://substackcdn.com/image/fetch/$s_!lCAt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd967cf-1832-4bbe-a40b-66cde471cd0c_1124x494.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">RICL architecture from their <a href="https://arxiv.org/pdf/2508.02062">paper</a>, showing a new </figcaption></figure></div><p>The end-result pipeline adds retrieval buffer of demonstrations before the VLA, and an interpolation unit after the action module:</p><pre><code><code>*Retrieval buffer* &#8594; VLM (GPU) &#8594; Action expert (GPU/CPU) &#8594; *Action interpolation (CPU/GPU)* &#8594; Trajectory tracking (CPU) &#8594; Torques</code></code></pre><p>This method is in a slightly different category, where relevant demonstrations need to occur and be reflected upon to adapt, compared to the potentially faster adaptation enabled by the previous methods. This strategy would not be sensible for time-sensitive or safety-critical tasks, but is categorically different and seemed worth reviewing.</p><h3>Closing thoughts</h3><p>In part 2 of this article series reviewing modern end-to-end robotics pipelines, we discussed why it may be useful to have some adaptation capability for fielded robots to handle unexpected conditions, and some examples of how it can be implemented. We also discussed some historical context from biology and control theory.</p><p>In part 3, we will try to get more hands-on and utilize what we learned from the first two parts to build up an effective pipeline from scratch. I&#8217;m still debating whether to use existing tools such as Isaac sim or build even more from first principles for clarity, so it may take some time before we get there. If you have any suggestions or feedback, let me know in the comments. If you found this article interesting, please share and subscribe for future posts. Thanks for reading!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/is-it-learning-online-motor-adaptation?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/is-it-learning-online-motor-adaptation/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation/comments"><span>Leave a comment</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>It would be interesting to compare the complexity of a model-based vs. neural network implementation of this function (maybe we can try that in part 3).</p></div></div>]]></content:encoded></item><item><title><![CDATA[The architecture behind “end-to-end” robotics pipelines]]></title><description><![CDATA[Part 1: Where the learning stack ends and the control stack begins]]></description><link>https://www.avikde.me/p/the-architecture-behind-end-to-end</link><guid isPermaLink="false">https://www.avikde.me/p/the-architecture-behind-end-to-end</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Mon, 26 Jan 2026 21:19:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!766O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This article is part of a series on end-to-end robotics pipelines:</em></p><ol><li><p>This article</p></li><li><p><a href="https://www.avikde.me/p/is-it-learning-online-motor-adaptation?r=5vzx85">Online motor adaptation</a></p></li><li><p><a href="https://www.avikde.me/p/debugging-as-architecture-insight">Dissecting a VLA</a></p></li><li><p><a href="https://www.avikde.me/p/a-coding-agent-equivalent-for-robotics?r=5vzx85&amp;utm_campaign=post&amp;utm_medium=web">Closing the action loop with a VLM &#8220;agent&#8221;</a></p></li><li><p><a href="https://www.avikde.me/p/building-a-reasoning-hierarchical">Demo combining the best features of end-to-end and classical approaches</a></p></li></ol><div><hr></div><p>Recent progress and excitement in humanoid robotics are largely driven by rapid gains in generalist capabilities. Historically, most robots were engineered for narrow, well-defined tasks. The current wave of companies, in contrast, is pursuing systems intended to operate across a broad range of activities, shifting both public and economic expectations toward robots that can serve as general-purpose physical agents.</p><p>A central part of this shift is the widespread claim of <em>end-to-end</em> pipelines, often described as going from &#8220;pixels to actions,&#8221; in contrast to earlier approaches built from hand-designed perception, planning, and control modules. This post examines what &#8220;end-to-end&#8221; means in practice: where the pipeline actually begins and ends, the tradeoffs between different architectural choices, and how the algorithms map to computing hardware.</p><p>Part 1 focuses on the &#8220;actions&#8221; side of &#8220;pixels to actions&#8221;: how learned systems interface with the physical control of the robot body. Part 2 will examine how these architectures adapt to environmental uncertainty and contact-rich interaction. Later parts will include hands-on comparisons using small standalone examples to make these differences concrete.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><em>This publication and this post contain the author&#8217;s personal thoughts and opinions only, and do not reflect the views of any companies or institutions.</em></p><h3>Why &#8220;end-to-end&#8221;</h3><p>Classical AI was built up from a strict idea of separation of sensing, planning, and action. To my knowledge, the first robot to embody Sense-Plan-Act was <a href="https://en.wikipedia.org/wiki/Shakey_the_robot">Shakey the robot</a> (~1970), which also employed one of the first <a href="https://en.wikipedia.org/wiki/Stanford_Research_Institute_Problem_Solver">symbolic AI systems</a>. This tiered structure was so formative to robotics research that most research labs today are dedicated to different portions of this hierarchy, such as &#8220;perception&#8221;, &#8220;planning&#8221;, or &#8220;locomotion&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!766O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!766O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!766O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!766O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!766O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!766O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg" width="1200" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;SRI researchers Nils Nilsson (right) and Sven Wahlstrom with Shakey the Robot in the late 1960s.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="SRI researchers Nils Nilsson (right) and Sven Wahlstrom with Shakey the Robot in the late 1960s." title="SRI researchers Nils Nilsson (right) and Sven Wahlstrom with Shakey the Robot in the late 1960s." srcset="https://substackcdn.com/image/fetch/$s_!766O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!766O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!766O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!766O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc93dedb4-82d1-4d59-b085-1a7493ed6040_1200x900.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Shakey the robot in the late 1960&#8217;s (photo from <a href="https://spectrum.ieee.org/sri-shakey-robot-honored-as-ieee-milestone">here</a>).</figcaption></figure></div><p>The sense-plan-act view today is dying a very rapid death. The modern narrative of general-purpose robotics holds that modular pipelines often fail because of limitations imposed by this decoupling; for example, perception errors break planners, planners produce infeasible motions, and most importantly, interfaces encode wrong assumptions.</p><p>As influencial AI researcher <a href="https://sergeylevine.substack.com/p/sporks-of-agi">Sergey Levine puts it</a>,</p><blockquote><p>for any learning-enabled system, any component that is <em>not</em> learned but instead designed by hand will eventually become the bottleneck to its performance</p></blockquote><p>End-to-end training avoids hand-designed intermediate representations, manually tuned cost functions, and any bottlenecks imposed by module interfaces.</p><p>Additionally, &#8220;end-to-end&#8221; sends a sociological signal to do with modern AI foundation-model alignment, scalability with data, and positions the company as an AI lab instead of a controls shop.</p><h3>The action end in practice</h3><p>The practical reality of &#8220;end-to-end&#8221; is more subtle than it might seem. In this section we&#8217;ll review what some published academic and commercial implementations actually appear to be doing today, and also try to outline how the implementation is mapped to computational hardware.</p><h4>The old way: model-based stacks (~2014)</h4><p>It is very common to have a whole-body controller at the low-level, as exemplified by the <a href="https://groups.csail.mit.edu/robotics-center/public_papers/Kuindersma14.pdf">2014 MIT Atlas team&#8217;s report</a>. After a high-level plan is created, a tracking controller is implemented as a quadratic program, and that generates the signals sent to the actuators:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NdCg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NdCg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 424w, https://substackcdn.com/image/fetch/$s_!NdCg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 848w, https://substackcdn.com/image/fetch/$s_!NdCg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 1272w, https://substackcdn.com/image/fetch/$s_!NdCg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NdCg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png" width="580" height="571" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ea62b32-540b-4439-9103-3401ae70d839_580x571.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:571,&quot;width&quot;:580,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58557,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/185869291?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NdCg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 424w, https://substackcdn.com/image/fetch/$s_!NdCg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 848w, https://substackcdn.com/image/fetch/$s_!NdCg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 1272w, https://substackcdn.com/image/fetch/$s_!NdCg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea62b32-540b-4439-9103-3401ae70d839_580x571.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 6 from the <a href="https://groups.csail.mit.edu/robotics-center/public_papers/Kuindersma14.pdf">2014 MIT Atlas team&#8217;s report</a> showing the low-level action pipeline, referred to as &#8220;Control.&#8221;</figcaption></figure></div><p>Mapping to computational hardware:</p><p><em>Trajectory optimizer (CPU) &#8594; WBC inverse dynamics/QP (CPU) &#8594; Joint/servo controllers (microcontroller/CPU) &#8594; Torques</em></p><h4>Learning followed by impedance controller (~2017-2020)</h4><p>To my knowledge, the first fielded robots using learning-based locomotion controllers appeared ~2018 from Google (using <a href="https://www.avikde.me/p/ghost-robotics-minitaur">Minitaur</a>) and in Marco Hutter&#8217;s group. As documented in the <a href="https://arxiv.org/pdf/1804.10332">2018 paper from Google</a> and the <a href="https://arxiv.org/pdf/1901.08652">highly-cited Hwangbo et al (2019) paper</a>, the most effective choice of action space was an impedance controller in turn influenced by <a href="https://arxiv.org/pdf/1611.01055">Peng et al (2017)</a>:</p><blockquote><p>Our experiments suggest that action parameterizations that include basic local feedback, such as PD target angles, MTU activations, or target velocities, can improve policy performance and learning speed across different motions and character morphologies</p></blockquote><p>The policy outputs desired joint positions and sometimes velocity offsets or  gain modulation, and the torque applied is a simple algebraic equation:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tau = K_p(q_{des} - q) + K_d(\\dot{q}_{des} - \\dot{q})&quot;,&quot;id&quot;:&quot;WKLXTSXCPG&quot;}" data-component-name="LatexBlockToDOM"></div><p>The virtue of this architecture is that it is very generic, and succeeds in decoupling the fast time-scales and discontinuities of making and breaking contact from the learning algorithm.</p><p>Mapping to computational hardware:</p><p><em>Policy eval (CPU/embedded GPU) &#8594; Impedance controller (CPU) &#8594; Actuators</em></p><h4>Figure AI&#8217;s &#8220;System 1&#8221; policy (2025)</h4><p><a href="https://www.figure.ai/news/helix">Figure AI&#8217;s Feb 2025 blog post</a> describes a &#8220;System 2 / System 1&#8221; design where a high-level vision-language model (S2) reasons about goals and semantics at low frequency, and a fast visuomotor network (S1) executes continuous control at high frequency. While this reflects a separation of timescales and roles, both modules are trained end-to-end with an abstract latent interface, meaning there is not a principled, physically interpretable handoff between high-level strategy and low-level control. As a result, Helix achieves generalization in perception and task reasoning but does not isolate physical control concerns (such as dynamics stabilization, contact interaction, or actuation abstraction) into structured model-based or classical control modules.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qh5X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qh5X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 424w, https://substackcdn.com/image/fetch/$s_!Qh5X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 848w, https://substackcdn.com/image/fetch/$s_!Qh5X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 1272w, https://substackcdn.com/image/fetch/$s_!Qh5X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qh5X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png" width="1322" height="596" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:596,&quot;width&quot;:1322,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:162629,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/185869291?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Qh5X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 424w, https://substackcdn.com/image/fetch/$s_!Qh5X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 848w, https://substackcdn.com/image/fetch/$s_!Qh5X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 1272w, https://substackcdn.com/image/fetch/$s_!Qh5X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca99354-c9d3-466f-a3a5-d7a765c02d49_1322x596.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure AI&#8217;s architecture from <a href="https://www.figure.ai/news/helix">their blog post</a>.</figcaption></figure></div><p></p><p>In a <a href="https://www.figure.ai/news/reinforcement-learning-walking">Mar 2025 blog post</a>, they describe what sounds more like the impedance controller above than the system 1 design, so it&#8217;s possible some combination of both architectures is utilized:</p><blockquote><p>We additionally run the policy output through kHz-rate closed-loop torque control to compensate for errors in actuator modeling</p></blockquote><p>Mapping to computational hardware:</p><p><em>System 2 (Transformer, GPU) &#8594; System 1 (Network, GPU) &#8594; [Impedance control (CPU)] &#8594; Torques</em></p><h4>Physical Intelligence&#8217;s action expert (2025)</h4><p>The <a href="https://www.pi.website/research/knowledge_insulation">architecture described</a> is similar to the system 1 above, but specifically suggests that the end-to-end training causes problems:</p><blockquote><p>When adapting a VLM to a VLA in this action expert design, the VLM backbone representations are exposed to the gradients from the action expert. Our experiments show that those gradients from the action expert lead to unfavorable learning dynamics, which not only results in much slower learning, but also causes the VLM backbone to lose some of the knowledge acquired during web-scale pre-training.</p></blockquote><p>This is conceptually analogous to known problems like <a href="https://en.wikipedia.org/wiki/Vanishing_gradient_problem">vanishing/exploding gradients</a> in deep nets, where lower layers dominate or drown out meaningful gradients for higher layers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ll11!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ll11!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 424w, https://substackcdn.com/image/fetch/$s_!Ll11!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 848w, https://substackcdn.com/image/fetch/$s_!Ll11!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 1272w, https://substackcdn.com/image/fetch/$s_!Ll11!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ll11!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png" width="1295" height="595" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:595,&quot;width&quot;:1295,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:68561,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/185869291?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ll11!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 424w, https://substackcdn.com/image/fetch/$s_!Ll11!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 848w, https://substackcdn.com/image/fetch/$s_!Ll11!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 1272w, https://substackcdn.com/image/fetch/$s_!Ll11!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a921377-0f3c-4390-ae1d-9b4de26babd8_1295x595.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Physical Intelligence&#8217;s architecture from <a href="https://www.pi.website/research/knowledge_insulation">their blog post</a>.</figcaption></figure></div><p></p><p>Another blog post describes issues to do with the mismatched control bandwidth of foundation model output to robot dynamics, solved by <a href="https://www.pi.website/research/real_time_chunking">outputting short horizon trajectories</a> that are played out by a low-level controller.</p><p>Mapping to computation:</p><p><em>VLM (GPU) &#8594; Action expert (GPU/CPU) &#8594; Trajectory tracking (CPU) &#8594; Torques</em></p><h4>Boston Dynamics + TRI&#8217;s pose tracking (2025)</h4><p>Their <a href="https://bostondynamics.com/blog/large-behavior-models-atlas-find-new-footing/">blog post describes</a> an architecture with the higher-level cognitive layer outputs joint positions and end-effector poses. While there isn&#8217;t an explicit decription of how these position setpoints are tracked, the post mentions Atlas&#8217;s MPC, and it is reasonable to assume that that is the lower-level controller.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JBTF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JBTF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 424w, https://substackcdn.com/image/fetch/$s_!JBTF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 848w, https://substackcdn.com/image/fetch/$s_!JBTF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 1272w, https://substackcdn.com/image/fetch/$s_!JBTF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JBTF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png" width="1024" height="372" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:372,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Our policy maps inputs consisting of images, proprioception and language prompts to actions that control the full Atlas robot at 30Hz. We leverage a diffusion transformer together with a flow matching loss to train our model.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Our policy maps inputs consisting of images, proprioception and language prompts to actions that control the full Atlas robot at 30Hz. We leverage a diffusion transformer together with a flow matching loss to train our model." title="Our policy maps inputs consisting of images, proprioception and language prompts to actions that control the full Atlas robot at 30Hz. We leverage a diffusion transformer together with a flow matching loss to train our model." srcset="https://substackcdn.com/image/fetch/$s_!JBTF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 424w, https://substackcdn.com/image/fetch/$s_!JBTF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 848w, https://substackcdn.com/image/fetch/$s_!JBTF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 1272w, https://substackcdn.com/image/fetch/$s_!JBTF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2410984-5712-4c1b-bdcb-55e45fe63d1f_1024x372.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Boston Dynamics + TRI architecture from <a href="https://bostondynamics.com/blog/large-behavior-models-atlas-find-new-footing/">their blog post</a>.</figcaption></figure></div><p>Mapping to computational hardware:</p><p><em>LBM inference (GPU) &#8594; MPC (CPU) &#8594; Actuator torques</em></p><h4>1X&#8217;s inverse dynamics model IDM (2026)</h4><p>1X also describes a hierarchy in <a href="https://www.1x.tech/discover/world-model-self-learning">their blog post</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F2IG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F2IG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 424w, https://substackcdn.com/image/fetch/$s_!F2IG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 848w, https://substackcdn.com/image/fetch/$s_!F2IG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 1272w, https://substackcdn.com/image/fetch/$s_!F2IG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F2IG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png" width="1041" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1041,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:302337,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/185869291?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F2IG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 424w, https://substackcdn.com/image/fetch/$s_!F2IG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 848w, https://substackcdn.com/image/fetch/$s_!F2IG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 1272w, https://substackcdn.com/image/fetch/$s_!F2IG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4953307e-d6f5-4b45-b35c-f5c13ccf29f4_1041x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">1X architecture from <a href="https://www.1x.tech/discover/world-model-self-learning">their blog post</a>.</figcaption></figure></div><p>World Model Backbone (WM): A text-conditioned video prediction model trained on internet-scale video data and fine-tuned on robot sensorimotor data. It predicts future visual states based on current observations and candidate actions.</p><p>Inverse Dynamics Model (IDM): Converts predicted future states into feasible robot action sequences that will produce those outcomes in the real world. The use of the term &#8220;inverse dynamics&#8221; suggests that the output actions are torques, though that isn&#8217;t specified.</p><p>Mapping to computational hardware:</p><p><em>World model (GPU) &#8594; IDM (GPU) &#8594; Actuator torques</em></p><h3>Why not end-to-end</h3><p>From the previous section, it is apparent that &#8220;end-to-end&#8221; doesn&#8217;t usually mean that a single algorithm or network is going from pixels to torques. In this section, we&#8217;ll try to list some potential intuitive reasons for this.</p><h4>Separation of concerns</h4><p>We saw above on Physical Intelligence&#8217;s blog post that there are difficulties in training an end-to-end policy that does so many different things. <a href="https://www.pi.website/research/knowledge_insulation">Another quote</a>:</p><blockquote><p>One hypothesis of why this is happening is the following. A pre-trained VLM, by its nature, pays attention to language inputs well. The gradients from the action expert now severly interfere with the model&#8217;s ability to process language, which leads the model to pick up on other correlations first.</p></blockquote><p>These problems are a side-effect of one network trying to solve a lot of different problems. The old Sense-Plan-Act schema enforced a separation of concerns very strictly, but even with a more relaxed architecture, low-level control priors drastically reduce the policy search space.</p><p>A human nervous exhibits similar separation with a cortex (goal-directed commands), cerebellum (fast adaptation, prediction), spinal reflexes (fast control loops), and even mechanical impedance control in muscles / tendons.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x-HL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x-HL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 424w, https://substackcdn.com/image/fetch/$s_!x-HL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 848w, https://substackcdn.com/image/fetch/$s_!x-HL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!x-HL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x-HL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg" width="900" height="644" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:644,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;This diagram shows the complete pathway a nerve impulse takes when a person tests the temperature of shower water with their hand. First, a sensory nerve ending in the index finger sends a nerve impulse to the spinal cord. A cross section of one segment of the vertebrae is shown from a superior view. The sensory nerve connected to the nerve ending is located in the dorsal root ganglion. The nerve ending is a dendrite of the sensory neuron, as it also has an axon that synapses with an interneuron. The interneuron then synapses with a second interneuron in the thalamus. This second interneuron synapses with brain tissue in the cerebral cortex, allowing conscious perception of the water temperature. The brain then initiates a motor command by stimulating an upper motor neuron in the cerebral cortex. The axon of the upper motor neuron extends all the way to the spinal cord, where it synapses with a lower motor neuron in the gray matter of the spinal cord. The impulse then travels down the lower motor neuron back to the hand where it synapses with the skeletal muscles of the hand. This triggers the muscle contractions that turn the dials of the shower to adjust the water temperature.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="This diagram shows the complete pathway a nerve impulse takes when a person tests the temperature of shower water with their hand. First, a sensory nerve ending in the index finger sends a nerve impulse to the spinal cord. A cross section of one segment of the vertebrae is shown from a superior view. The sensory nerve connected to the nerve ending is located in the dorsal root ganglion. The nerve ending is a dendrite of the sensory neuron, as it also has an axon that synapses with an interneuron. The interneuron then synapses with a second interneuron in the thalamus. This second interneuron synapses with brain tissue in the cerebral cortex, allowing conscious perception of the water temperature. The brain then initiates a motor command by stimulating an upper motor neuron in the cerebral cortex. The axon of the upper motor neuron extends all the way to the spinal cord, where it synapses with a lower motor neuron in the gray matter of the spinal cord. The impulse then travels down the lower motor neuron back to the hand where it synapses with the skeletal muscles of the hand. This triggers the muscle contractions that turn the dials of the shower to adjust the water temperature." title="This diagram shows the complete pathway a nerve impulse takes when a person tests the temperature of shower water with their hand. First, a sensory nerve ending in the index finger sends a nerve impulse to the spinal cord. A cross section of one segment of the vertebrae is shown from a superior view. The sensory nerve connected to the nerve ending is located in the dorsal root ganglion. The nerve ending is a dendrite of the sensory neuron, as it also has an axon that synapses with an interneuron. The interneuron then synapses with a second interneuron in the thalamus. This second interneuron synapses with brain tissue in the cerebral cortex, allowing conscious perception of the water temperature. The brain then initiates a motor command by stimulating an upper motor neuron in the cerebral cortex. The axon of the upper motor neuron extends all the way to the spinal cord, where it synapses with a lower motor neuron in the gray matter of the spinal cord. The impulse then travels down the lower motor neuron back to the hand where it synapses with the skeletal muscles of the hand. This triggers the muscle contractions that turn the dials of the shower to adjust the water temperature." srcset="https://substackcdn.com/image/fetch/$s_!x-HL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 424w, https://substackcdn.com/image/fetch/$s_!x-HL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 848w, https://substackcdn.com/image/fetch/$s_!x-HL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!x-HL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ce19781-7069-42d4-9d35-c62bd56bf76e_900x644.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Nervous system components (figure from <a href="https://courses.lumenlearning.com/umd-publichealthbio/chapter/the-function-of-nervous-tissue/">here</a>).</figcaption></figure></div><h4>Training complexity</h4><p>Related to the separation of concerns above, an end-to-end network must learn contact mechanics, actuator dynamics, delays, friction, impact stabilization, as well as task-level planning, all in one gradient signal.</p><p>This creates extremely long credit chains and high sample complexity. Hierarchical control factorizes the learning problem.</p><h4>Feedback control loops; tactile and force feedback</h4><p>With a fully end-to-end system, any feedback on how the executing is going can only come in at the top. In contrast, a dedicated low-level control unit can run its own feedback controller that performs stabilization functions. This is in effect what we saw above with the selection of the impedance controller in the Peng and Hutter papers above.</p><p>Secondly, a low-level controller also provides a great opportunity to incorporate a rich set of sensory signals such as tactile and force feedback information. Rodney Brooks underlines the importance of non-visual feedback in his <a href="https://rodneybrooks.com/why-todays-humanoids-wont-learn-dexterity/">Sep 2025 essay</a>, going as far as to flag it as a roadblock. The problem is, if you must have force feedback in an end-to-end model, you first have to contend with the lack of large-scale force data to train it from, as well as the much larger end-to-end model you now have to train and evaluate at inference-time. As I responded to a Substack comment <a href="https://substack.com/@avikde/note/c-203946866?r=5vzx85&amp;utm_source=notes-share-action&amp;utm_medium=web">here</a>, a low-level control unit is a potential way that that data could be incorporated, without increasing the dimensionality of the higher-level brain.</p><h4>Control bandwidth</h4><p>Real-world physics and dynamics don&#8217;t wait for end-to-end inference to complete, and most implementations (Physical Intelligence&#8217;s action chunking, Figure&#8217;s rate-decoupled system 1, etc.) need to decouple the control bandwidth of the cognitive layer from the low-level controller.</p><p><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Chris Paxton&quot;,&quot;id&quot;:232680664,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!13Dp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a886fd-347d-4694-b670-0253975d2ba9_659x547.png&quot;,&quot;uuid&quot;:&quot;e9159de1-8107-495d-825e-bdb80a0bb838&quot;}" data-component-name="MentionToDOM"></span> talks about this aspect as an action inference limitation in his excellent <a href="https://itcanthink.substack.com/p/vision-language-action-models-and">post about VLA&#8217;s</a> which you should read if you haven&#8217;t.</p><h4>Sim2real transfer</h4><p>As discussed in my recent <a href="https://www.avikde.me/p/the-ai-world-models-debate-and-its">world models post</a>, almost all these implementations that utilize large-scale demonstration data need to follow it up with reinforcement learning post-training in simulation. This surfaces an issue that has been named &#8220;sim2real transfer,&#8221; where the simulator&#8217;s accuracy can limit the deployed behavior. This has a number of solutions including domain randomization and actuator networks, but alternatively, having a low-level controller can in many cases absorb modeling error with their inverse dynamics functionality. Physics errors affect torque-level policies massively, but impedance control, whole-body control, or model-predictive control absorb modeling error by actively driving mismatch errors to zero.</p><h4>Safety constraints</h4><p>We can explicitly add torque constraints, joint kinematic limits, self-collision avoidance, to a low-level controller. This is intuitively true, but I&#8217;ll leave an example of a <a href="https://umi-ft.github.io/">recent research paper</a> which found out exactly this. Quoting the author:</p><blockquote><p>Introducing UMI-FT: the UMI gripper equipped with force/torque sensors (CoinFT) on each finger. Multimodal data from UMI-FT, combined with diffusion policy and compliance control, enables robots to apply sufficient yet safe force for task completion. </p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1u9p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1u9p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 424w, https://substackcdn.com/image/fetch/$s_!1u9p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 848w, https://substackcdn.com/image/fetch/$s_!1u9p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 1272w, https://substackcdn.com/image/fetch/$s_!1u9p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1u9p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png" width="633" height="271" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:271,&quot;width&quot;:633,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1u9p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 424w, https://substackcdn.com/image/fetch/$s_!1u9p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 848w, https://substackcdn.com/image/fetch/$s_!1u9p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 1272w, https://substackcdn.com/image/fetch/$s_!1u9p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c63ff0-9c9d-4039-a248-06e4b600e33e_633x271.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">UMI-FT <a href="https://umi-ft.github.io/">research paper</a> architecture with explicity safety constraints in lower-level controllers.</figcaption></figure></div><h4>Generalization across hardware embodiment (*maybe)</h4><p>In principle, if the low-level controller completely abstracts the hardware, the higher-level brain&#8217;s functionality can be kept the same with different embodiments. Intuitively, you can reuse high-level policies if low-level layers abstract hardware, and you can improve low-level stability without retraining ML.</p><p>However, this intuitive point is difficult to verify due to the methodology of how the cognitive models are developed today. The end-to-end pixel &#8594; action policies always incorporate some amount of information about the embodiment, so it isn&#8217;t possible to train an abstract cognitive model. In practice, the foundation models of today train on <a href="https://www.pi.website/blog/pi0">cross-embodiment</a> data to obtain generalizable knowledge. To get to the bottom of this facet, we would need to understand what constitutes a cognitive model separate from embodiment, and that is not known yet as discussed in my previous world models post:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;651f537e-8ea4-4af0-ab46-6866064a066c&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The AI world models debate and its foreshadowing on robotics&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Writing about safe, efficient AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-14T08:18:52.656Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Xry4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/the-ai-world-models-debate-and-its&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:184309659,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:4,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><h3>Closing thoughts</h3><p>With the end of Sense-Plan-Act, the new robotics north star is an end-to-end pipeline that does away with the need for any task-specific pipeline architecture or programming. However, today&#8217;s successful implementations tell a different story, and there are a number of intuitive reasons for this.</p><p>Foundation models excel at semantic, perceptual, and strategic reasoning, but they are mismatched to high-bandwidth, stability-critical motor control. A robust robotic architecture separates concerns into layers aligned with physical timescales and modeling regimes.</p><p>In this (part 1) article, we focused on standard visuomotor task execution. In part 2 of this series, we&#8217;ll look at how unexpected events and motor adaptation are handled in these architectures. After that, to continue this series, I&#8217;d also like to explore a standalone demonstration that can be published as an open-source repo that examines a few of these architectures and compares them fairly.</p><p>If you found this post interesting, please let me know in the comments, and share, and subscribe. Thanks for reading!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/the-architecture-behind-end-to-end/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/the-architecture-behind-end-to-end/comments"><span>Leave a comment</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/the-architecture-behind-end-to-end?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/the-architecture-behind-end-to-end?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[What von Neumann understood about the architecture of intelligence before we built AI]]></title><description><![CDATA[The Computer and the Brain anticipated both the successes and shortcomings of deep learning AI 70 years ago]]></description><link>https://www.avikde.me/p/what-von-neumann-understood-about</link><guid isPermaLink="false">https://www.avikde.me/p/what-von-neumann-understood-about</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Mon, 19 Jan 2026 19:17:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_hYZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>My weekend read was &#8220;The Computer and the Brain&#8221;, an out-of-print book I picked up at the Strand Bookstore last year. John von Neumann wrote most of the contents in 1955 to prepare material for the Silliman lectures in 1956&#8212;an obligation that clearly meant a lot to him. He was diagnosed with bone cancer that year, but continued writing his notes in the hopes of being able to deliver them in some form. Tragically, he was never able to deliver the lectures, but his wife was able to collect and publish the partial manuscripts prefaced by a <a href="https://mathshistory.st-andrews.ac.uk/Extras/Von_Neumann_Silliman/">heart-wrenching letter</a>, and they would become his last words on these topics.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_hYZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_hYZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_hYZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_hYZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_hYZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_hYZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg" width="600" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:81581,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/185086427?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_hYZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_hYZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_hYZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_hYZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb431bb7f-95fa-4073-989f-29376d717e30_600x800.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I&#8217;ve known of von Neumann&#8217;s huge legacy on modern computing from a college computer organization course, but I was stunned at how much he was able to extrapolate into ideas about computation in general. His writings, from before the first transistor-based computer was built, are ever-relevant after 70 years of exponential growth in computing technology. He wasn&#8217;t correct about everything&#8212;that would be impossible&#8212;but the ways in which he was wrong are even more revealing and thought-provoking. They anticipate the reason deep learning has been so capable, and also predict the architectural limits we are now running into&#8212;memory bottlenecks, brute-force scale, and energy-hungry intelligence. They also anticipate the future directions we can go in to overcome these deficiencies.</p><p>The book is very short and absolutely worth a read if you can pick it up from a library or used bookstore, but I had four broad and powerful takeaways that contextualized decades of development for me.</p><ol><li><p><strong>Scale &amp; memory:</strong>  Basic operations force massive memory movement</p></li><li><p><strong>Precise vs. statistical:</strong>  Deep learning (DL) escapes numerical fragility by becoming brain-like </p></li><li><p><strong>Depth vs. architecture:</strong>  DL substitutes scale for structural sophistication in the brain</p></li><li><p><strong>Representation &amp; substrate:</strong>  DL is rigid where the brain is fluid</p></li></ol><p>I&#8217;ll explain these four aspects below, but together they point to the same overall thesis:</p><p>Modern AI succeeded by replicating the statistical aspect of natural computation, but suffers from brute-force scaling inside an architecture that von Neumann already suspected was fundamentally mismatched to cognition.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>1. Scale &amp; memory</h3><p>As the book says, the principle of &#8220;one organ for each basic operation&#8221; necessitates memory for intermediate values, on top of instruction and data memory. Von Neumann predicted that computation systems built from simple primitives can only scale by also scaling memory.</p><p>Scalar CPU architecture is still very close to von Neumann&#8217;s artificial automaton. Post-von-Neumann architectures include systolic arrays (TPUs) and near-memory compute; GPUs are a bit of a hybrid with shared memory (scratchpads), and tiled matrix multiply (data reuse). Even heavily optimized post-von-Neumann machines are still dominated by data movement, because the algorithmic structure forces it.</p><p>Modern deep learning vindicates this: intelligence is achieved not through complex operations but through scale, which makes memory movement, not computation, the central bottleneck of contemporary hardware. We have been talking about the <a href="https://ieeexplore.ieee.org/document/10477550">AI memory wall</a> for a few years, but it was inevitable from these predictions 70 years ago.</p><p>A related aspect which von Neumann couldn&#8217;t have anticipated was the energetic impact of memory access. He did write about the energetic cost of logic operations, but today, moving 1 bit from DRAM costs more energy than 1 FLOP, to the tune of 100&#215; a multiply. This pressure is driving technology development in near-memory compute, in-memory analog MACs, optical interconnects. The same architectural tension von Neumann identified today drives the economics<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> of AI hardware.</p><h3>2. Precise vs. statistical</h3><p>One of the central topics in the book is how digital computation needs very high precision because of the high arithmetic depth of repeated basic operations. If each operation has error &#949;, after N steps you expect error &#8776; O(N&#949;). Deep networks have <em>extreme arithmetic depth</em> with thousands of layers and trillions of operations. However, empirically, 4-bit quantization in deep learning <a href="https://developer.nvidia.com/blog/optimizing-llms-for-performance-and-accuracy-with-post-training-quantization/">works with nearly no drop in accuracy</a>.</p><p>Why doesn&#8217;t error compound the way von Neumann predicted?</p><p>The key: von Neumann analyzed precise numerical methods (like solving equations, integrating trajectories), but neural networks are different in a couple of important ways:</p><ol><li><p>Noise is inherent in the training process, resulting in a function approximator with inherent robustness to input noise.</p></li><li><p>In the accumulation function, the errors are mixed across thousands of dimensions, clipped by nonlinear saturating functions, and averaged out statistically.</p></li></ol><p>The overall error is clamped and damped, and does not propagate in the same way that von Neumann assumed.</p><p>Relatedly, von Neumann argued that the brain works with low precision (1-10 bits), and performs a different type of computation than digital computers (32-64 bits). He referred to the brain as performing &#8220;statistical computing&#8221;. So deep learning is not violating von Neumann, but it is <strong>occupying the biological side of his dichotomy</strong>.</p><h3>3. Depth vs. architecture</h3><p>Von Neumann emphasizes three biological facts about neurons:</p><ol><li><p><strong>Low precision</strong></p></li><li><p><strong>Low speed</strong> (~10 ms per spike, though they can respond slightly faster under extreme stimulation)</p></li><li><p><strong>Shallow circuits</strong></p></li></ol><p>We discussed the precision above; let&#8217;s dig into the others next. The nervous system is very slow, with each &#8220;layer&#8221; taking on the order of 10 ms to fire and reset (compared to digital lines changing state in &lt; 1 ns). This means that while it is feasible to have a &#8220;deep&#8221; digital computation, that would be infeasible in a natural system.</p><p>The shallowness is also important: a crucial example in the book is that the retina does significant computation using three synapse layers, which is orders of magnitude smaller than is needed for <a href="https://towardsdatascience.com/image-classification-with-vision-transformer-8bfde8e541d4">modern Vision Transformer (ViT) encoders</a> (hundreds of layers, billions of parameters).</p><p>How is this possible? The answer is that a biological neuron is not a basic linear unit + nonlinearity; it is more like a <strong>small analog computer</strong>. Each neuron has temporal dynamics, neuromodulators, and plasticity rules. Its connections are even more complex: each has hundreds of synapses, nonlinear integration of activations with potential spatial and geometric relations.</p><p>So the contrast is stark:</p><ul><li><p>The brain has <em>shallow</em> compositions of <em>slow</em> and <em>low-precision</em> units</p></li><li><p>Deep nets have <em>very deep</em> compositions of <em>very fast</em> and <em>medium-low</em>-precision units</p></li></ul><p>Von Neumann predicted that these fundamental differences in the basic blocks would result in different natural vs. artificial computing paradigms:</p><blockquote><p>Hence the logical approach and structure in natural automata may be expected to differ widely from those in artificial automata.</p></blockquote><p>Modern deep learning compensates for architectural simplicity with scale. Biology compensates for slow, noisy hardware with architectural sophistication and better primitives. This distinction strongly influences why our systems are large, power-hungry, data-hungry, and memory bound.</p><p>I hadn&#8217;t anticipated this connection when I started reading the book, but my article from last week also visits this architectural distinction from a world-model-representation perspective:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;c01c2482-9885-4fc0-b992-e9590bd3f4eb&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The AI world models debate and its foreshadowing on robotics&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Writing about safe, efficient AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-01-14T08:18:52.656Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Xry4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/the-ai-world-models-debate-and-its&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:184309659,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:2,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Why did we choose scaling of simple units for computing? Among other reasons (as discussed in the previous article), deep learning was built around universality + scalability, not biological realism. Simple units have advantages: easy to parallelize, easy to implement on GPUs, and easy to map to silicon.</p><p>What does this mean for general artificial intelligence? Von Neumann suspected that digital logic gates were too primitive to model cognition efficiently, and with today&#8217;s technology it is certainly true that the brain&#8217;s performance at 10W cannot be matched even at much higher power.</p><h3>4. Representation &amp; substrate</h3><p>Von Neumann observes that representations of quantities which go through the nervous system may change from digital to analog and vice versa repeatedly. They can also have adaptive precision representations<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. In contrast, digital machines commit very early to fixed-width numbers everywhere (FP32, FP16, INT8, etc.), and even &#8220;mixed precision&#8221; is coarse and static.</p><p>These points seem to suggest another architectural dichotomy (not just the connections between units, but also in how numerical quantities are represented). The brain has <em>adaptive </em>primitives, precisions, and numerical representations, whereas they are all <em>fixed</em> in the digital computing paradigm.</p><p>Is the answer analog computing? Von Neumann himself rejected naive analog computing due to its problems of scalability and reliability. The brain may be powerful while being efficient because it is <em>representationally flexible</em>, not because it is analog per se.</p><p>Neuromorphic computing is exactly about this axis, with conceptual departures such as event-driven computation, mixed analog/digital circuits, co-located computation and memory. My knowledge of the field is limited and I am not sure that any of the existing research in that area truly captures what von Neumann was hinting at, but I suspect that in the long-term future of this publication, neuromorphic computing will come up again.</p><div><hr></div><p>Thanks for reading! Let me know if you&#8217;d suggest any related historical or modern writing on this topic, and please share and subscribe if you liked the essay.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/what-von-neumann-understood-about?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/what-von-neumann-understood-about?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>See, for example, <a href="https://groq.com/newsroom/groq-and-nvidia-enter-non-exclusive-inference-technology-licensing-agreement-to-accelerate-ai-inference-at-global-scale">Groq-NVIDIA</a> deal, <a href="https://www.tomshardware.com/pc-components/ram/data-centers-will-consume-70-percent-of-memory-chips-made-in-2026-supply-shortfall-will-cause-the-chip-shortage-to-spread-to-other-segments">DRAM shortages</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>A nice example of this is the &#8220;average pulse frequency&#8221; interpretation of a sequence of quasiperiodic pulses. Coarse spike counts suffice for rough decisions, and temporal averaging increases accuracy automatically.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[The AI world models debate and its foreshadowing on robotics]]></title><description><![CDATA[Plus, five facets of comparison for the two approaches]]></description><link>https://www.avikde.me/p/the-ai-world-models-debate-and-its</link><guid isPermaLink="false">https://www.avikde.me/p/the-ai-world-models-debate-and-its</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Wed, 14 Jan 2026 08:18:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Xry4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Large language model (LLM)-based tools such as chatbots, coding assistants, and writing aids have become widely adopted and have had significant cultural and economic impact and utility. At the same time, the conversation continues about what kinds of progress these models represent and what their limitations may be. One of the central questions in this discussion is whether &#8220;scaling&#8221; improvements in LLMs (primarily achieved through larger models and larger training datasets) can lead to general intelligence, or whether additional architectural or conceptual advances will be required.</p><p>In parallel with these debates, especially on the heels of numerous announcements at CES 2026, the cultural focus is increasingly driving toward robotics or &#8220;physical AI&#8221;; is there a physical equivalent to this intellectual debate between scaling and structured models?</p><p>Here, we&#8217;ll try to go over some of the key aspects of this intellectual and conceptual spectrum starting with the informational world, and examine the implications of the equivalent schools of thought in the physical world.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><em>This publication and this post contain the author&#8217;s personal thoughts and opinions only, and do not reflect the views of any companies or institutions.</em></p><h2>Today&#8217;s AI is a product of scaling a simple architecture (mostly)</h2><p>Breaking down this heading, by &#8220;today&#8217;s AI,&#8221; I&#8217;m referring to the most pervasive products, such as chatbots, search, coding and writing assistants. These systems are typically based on large transformer architectures composed of many repeated layers and trained on vast datasets, with models today having hunders of billions of parameters.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> In simplified form, these systems operate by mapping input tokens into embeddings, processing them through a stack of transformer blocks, and producing probability distributions over possible next tokens via a final linear projection and softmax layer.</p><p>Since the initial release of ChatGPT, the dominant trend in the development of these models has been to increase their size and the amount of data used for training, rather than to introduce fundamentally new architectural principles.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GOex!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GOex!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GOex!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GOex!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GOex!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GOex!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg" width="594" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:594,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Transformer model size over time&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Transformer model size over time" title="Transformer model size over time" srcset="https://substackcdn.com/image/fetch/$s_!GOex!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GOex!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GOex!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GOex!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586a164a-ff44-40ce-8149-f8542424601d_594x500.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure from <a href="https://blogs.nvidia.com/blog/what-is-a-transformer-model/">NVIDIA</a> about LLM scaling</figcaption></figure></div><p>Given this architectural simplicity, the range of capability expressed by LLM-based tools is frankly impressive. Much of this capability therefore arises from the interaction between large model size and extensive training data, rather than from task-specific design and bespoke computational structures.</p><p><a href="https://blog.samaltman.com/three-observations">Sam Altman&#8217;s early 2025 blog post</a> and the empirical observations of companies building LLMs added on evidence and expectation of continued scaling of intelligence this way. These observations led to a &#8220;scale is all you need&#8221; movement that has had enormous impact on our society and economy, with <a href="https://www.mckinsey.com/industries/private-capital/our-insights/scaling-bigger-faster-cheaper-data-centers-with-smarter-designs">$1.7 trillion of projected investment by 2030</a>.</p><p>The larger debate we&#8217;re looking at in this post is about the prediction that scale is <em>sufficient </em>(more below), but it is also important to ask if it is <em>necessary</em>. I.e. <strong>is scale </strong><em><strong>required</strong></em><strong> to exhibit the same progress?</strong> The answer to this is likely yes; as stated in a <a href="https://osf.io/preprints/psyarxiv/c5gh8_v1">Dec 2025 preprint by Quattrociocchi et al</a>, when the models are restricted to the transformer architecture described above, it appears to be true that &#8220;their apparent intelligence emerges only under conditions of massive scale&#8221;.</p><p>Another natural question is <strong>why there has been so much investment into exploiting scaling</strong>, vs. exploration of other architectures. The first is that progress is consistent and predictable (even suggesting scaling &#8220;laws&#8221; as in the Altman blog post) which enable predictable engineering and financial projections. Innovation and development of new architectures is a relatively unpredictable and risky process. Another very prominent virtue is that simple architectures are much easier for collaboration with other parts of the engineering stack, and has been key for the <a href="https://chipinsights.net/p/the-alphabet-soup-of-processors">adoption</a> of hardware acceleration for deep learning.</p><p>Many leading researchers such as Demis Hassabis, Geoffrey Hinton, and teams at OpenAI and Anthropic maintain that scaling remains a primary driver of progress.</p><h2>The other side of the AI debate</h2><p>Over the recent past, there have been an increasing number of arguments disagreeing with the claim that scaling is sufficient to get to arbitrary &#8220;intelligence.&#8221;</p><p>Per the March 2025 <a href="https://www.nature.com/articles/d41586-025-00649-4">findings</a> of the annual meeting of the AAAI, including responses from more than 475 members (67% of them academics),</p><blockquote><p>More than three-quarters of respondents said that <a href="https://www.nature.com/articles/d41586-023-00641-w">enlarging current AI systems &#8213; an approach that has been hugely successful</a> in enhancing their performance over the past few years &#8213; is unlikely to lead to what is known as artificial general intelligence (AGI).</p></blockquote><p>Well-respected AI researchers are starting to form the next wave of AI companies that try to encode some kind of &#8220;world model&#8221; or semantic understanding of the world: Dr. Fei-Fei Li&#8217;s World Labs generates images and videos but<a href="https://spectrum.ieee.org/fei-fei-li-world-labs"> only via an intermediating representation of a 3D world</a>. Yann LeCun&#8217;s new startup <a href="https://techcrunch.com/2025/12/19/yann-lecun-confirms-his-new-world-model-startup-reportedly-seeks-5b-valuation/">AMI labs is likely also building world models</a> via some form of his published JEPA work. Ilya Sutskever (one of OpenAI&#8217;s founders, who had a large contribution to Sam Altman&#8217;s perspective above) <a href="https://www.dwarkesh.com/p/ilya-sutskever-2">went on Dwarkesh&#8217;s podcast</a> and said that scaling alone would not carry us to AGI and that &#8220;something crucial is missing.&#8221; Cognitive scientist Gary Marcus has <a href="https://garymarcus.substack.com/">frequently writes</a> about the need for symbolic reasoning for AI and is often in the thick of the debate on how to get there.</p><h3>What is a world model?</h3><p>There is at present no clearly-victorious architecture for how to encode added structure in large AI models. Consider a few examples from the AI world:</p><ul><li><p><a href="https://www.worldlabs.ai/">World Labs</a>, whose product generates consistent images and video, would define it as metric information about a 3D scene</p></li><li><p>Many AI researchers using a working definition for a world model as a <a href="https://itcanthink.substack.com/p/what-are-robot-world-models">(potentially latent-space) dynamical model that predicts how the state of the world evolves under actions</a>.</p><ul><li><p>Schmidhuber wrote a <a href="https://arxiv.org/pdf/1803.10122">paper about world models in 1991</a>, with the working definition as &#8220;predicting future sensory data given our current motor actions&#8221;</p></li><li><p>Yann LeCun proposes learning and predicting latent-space dynamics in his JEPA research (papers 2022-2025) &#8212; crucially, the projection to latent space is also learned from data, making it more general but less grounded in physical laws</p></li><li><p>The 1x world model is <a href="https://www.1x.tech/discover/world-model-self-learning">described in Jan 2026</a> as having latent space prediction capability and used to generated predicted future video states</p></li><li><p><a href="https://arxiv.org/pdf/2506.01622">DeepMind&#8217;s 2025 paper</a> also seeks a &#8221;predictive model of its environment&#8221; &#8212; In the paper it is a markov process, but for a continuous system such as a robot, it would be continuous or discretized dynamics governed by physics. It does not, however, specify how one would design architectures to take advantage of world models: &#8220;Future work should explore developing scalable algorithms for eliciting these world models and using them to improve agent safety.&#8221;</p></li></ul></li></ul><p>Zooming out to broader science, models have been developed and used in almost all fields; biologists have been <a href="https://openlibrary.org/books/OL2049287M/The_organization_of_learning">discovering models for navigation</a> in animal brains, physicists have been developing models for the behavior of the universe from quantum to astronomical scales for centuries, civil engineers have been using models of mechanics to build our houses and bridges, etc. Gary Marcus <a href="https://garymarcus.substack.com/p/generative-ais-crippling-and-widespread">defines</a> a cognitive world model as &#8220;a computational framework that a system (a machine, or a person or other animal) uses to track what is happening in the world &#8230; persistent, stable, updatable (and ideally up-to-date) internal representations of some set of entities within some slice of the world.&#8221; Each of these parties would likely have different opinions on models of the world / universe that AI should be imbued with.</p><p>In this post, we&#8217;ll stay focused on whether the added structure is important, but not discuss the relative merits of these varied proposals. (That is a potential topic for future posts; make sure to subscribe to get notified when they get published)</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/subscribe?"><span>Subscribe now</span></a></p><h3>Why do we need world models?</h3><p>The critical view is that while LLMs are designed to predict what to do next, but are not designed to build an underlying semantic understanding, and that there are many examples of errors (or &#8220;hallucinations&#8221;) that can ultimately be root-caused to this:</p><p>LLMs can <a href="https://garymarcus.substack.com/p/generative-ais-crippling-and-widespread">parrot rules of chess but will make illegal moves</a> at the same time, they do not generalize well to <a href="https://saanyaojha.substack.com/p/the-man-who-cant-be-moved">out-of-training scenarios or under uncertainty</a> and can produce unpredictable responses to uncommon inputs such as <a href="https://www.plough.com/en/topics/life/technology/computers-cant-do-math">SolidGoldMagikarp</a>, they exhibit &#8220;<a href="https://arxiv.org/pdf/2408.06518v3">semantic leakage</a>&#8221; of concepts and semantics in their input streams, with <a href="https://www.fox13now.com/news/local-news/summit-county/how-utah-police-departments-are-using-ai-to-keep-streets-safer">real-world impacts on usage of AI for policing</a>. While capabilities of LLMs do keep increasing, there is concern that errors such as these cannot be universally eradicated without an architectural shift.</p><h2>Is there an equivalent debate in robotics?</h2><p>Humanoid robotics in particular has been having a prominent rise into the <a href="https://www.cnbc.com/2026/01/09/humanoid-robots-take-over-las-vegas-at-ces-tech-touts-future-of-ai.html">cultural</a> and <a href="https://techcrunch.com/2025/09/16/figure-reaches-39b-valuation-in-latest-funding-round/">economic</a> consciousness in the last few years. Humanoids have been featured at <a href="https://www.nvidia.com/en-us/on-demand/session/gtc24-s62542/">NVIDIA keynotes for about two years</a> now, clearly signaling that the time is here for robotics companies to show their products and get mass-market adoption. While the field of robotics has existed for a long time, it is undeniable that the capabilities demonstrated have been seeing large improvements along with this increased exposure to the public eye.</p><p>Does the same architectural divide we just discussed for LLMs also exist in robotics? Less is known (much less agreed upon) about the best way to develop advanced capabilities in these robots, but we can use public information from some companies that have made product announcements to guess some patterns:</p><ul><li><p>The Boston Dynamics CEO <a href="https://www.businessinsider.com/huamnoid-robots-manufacturing-deployment-timeline-robert-playter-ceo-interview-2026-1">says</a> that they &#8220;need to be able to bring a new task to bear in a day or two &#8230; because, I think in a factory, there&#8217;s literally hundreds of tasks and the tasks evolve,&#8221; and their <a href="https://www.cbsnews.com/news/boston-dynamics-ai-powered-humanoid-robot-learning-factory-work-60-minutes-transcript/">60 minutes feature</a> shows the ability to rapidly deploy motion capture or VR demonstration data to their Atlas robot</p></li><li><p>Figure describes its &#8220;<a href="https://www.figure.ai/news/project-go-big">Project Go-Big</a>&#8221; as an effort to collect human demonstration data in the form of first-person video for pre-training<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> a navigation model</p></li><li><p>1x <a href="https://www.wsj.com/tech/personal-tech/i-tried-the-robot-thats-coming-to-live-with-you-its-still-part-human-68515d44">described</a> its plan to collect teleoperated demonstration data with its robot in people&#8217;s homes for continued training of its AI model in Oct 2025, and released an <a href="https://techcrunch.com/2026/01/13/neo-humanoid-maker-1x-releases-world-model-to-help-bots-learn-what-they-see/">update in Jan 2026</a> suggesting learning from internet-scale videos as demonstration followed by RL in simulation</p></li></ul><p>I want to note that all these companies have very intelligent researchers and engineers on their staff, and it is very possible (and likely) that there is more going on in these particular demos; I only include these specific reference points as context to pick out broad themes. Some surfacing patterns are that (a) the rate at which different tasks are demonstrated is a high priority for these companies, (b) many of them are looking to pre-training with motion data collected from humans, and (c) this will be followed by post-training using reinforcement learning (most likely in simulation) where the system&#8217;s reward will include matching the demonstration.</p><p>My rough summary here is largely echoed by Rodney Brooks in his <a href="https://rodneybrooks.com/why-todays-humanoids-wont-learn-dexterity/">2025 post on humanoid robot dexterity</a>:</p><blockquote><p>How the humanoid companies and academic researchers have chosen to do this is largely through having a learning system watch movies of people doing manipulation tasks, and try to learn what the motions are for a robot to do the same tasks. In a few cases humans teleoperate a robot, that they can see, along with the objects being manipulated ...</p></blockquote><h3>A robotics parallel of LLM development</h3><p>Very roughly, the training process for both status-quo approaches have similar-looking steps:</p><ul><li><p><strong>pre-training</strong> - reading internet-scale text (LLMs), vs. watching internet-scale human demonstration video or motion data (robots);</p></li><li><p><strong>post-training</strong> - RLHF and its modern equivalents vs. RL in simulation followed by sim-to-real porting and deployment</p></li></ul><p>With this grounding, we can ask <strong>whether robotics applications will run into the same problems and debates</strong> as we discussed for LLMs above.</p><p>One unknown is whether motion data is the best analogue of text data. Rodney Brooks articulates some concerns about this in his dexterous manipulation essay, suggesting that tactile sensing data is needed (but internet-scale tactile sensing data, or any other kind of robot data, doesn&#8217;t exist). It is likely that all the robots will <a href="https://www.figure.ai/news/introducing-figure-03">include tactile sensors</a> in some form, but it isn&#8217;t clear yet how they will fit into this human demonstration large-data paradigm. </p><p>The larger question is whether a navigation capability trained with motion data will generalize to unseen and unexpected situations, since it is not designed to encode an explicit understanding of &#8220;objects&#8221; or &#8220;inertias&#8221; or &#8220;positions&#8221;. This concern exactly mirrors the ones about semantic understanding in LLMs. It is likely that the rate of this class of error will go down with larger models trained with more data (effectively, the scaling argument). To accomplish that goal, the &#8220;<a href="https://itcanthink.substack.com/p/how-can-we-get-enough-data-to-train">robot data gap</a>&#8221; will need to be closed, which will take a lot of compute for data generation and training larger models due to the large dimensionality of the sensory and action spaces in robotics.</p><p>It is also relatively more difficult to &#8220;scale-up&#8221; in robotics for several reasons. First, latency and real-time reaction is much more important than in a chatbot setting, and so increasing model size at the cost of latency is not viable. In <a href="https://www.figure.ai/news/helix">Figure&#8217;s Feb 2025 blog post</a>, we can see that a 7B parameter VLM is used, at a time when when much larger (and presumably more accurate) models were available, and 1x states that <a href="https://www.1x.tech/discover/world-model-self-learning">11 seconds of thinking are required for 5 second tasks</a>. Second, as Chris Paxton has written about <a href="https://itcanthink.substack.com/p/how-can-we-get-enough-data-to-train">many</a> <a href="https://itcanthink.substack.com/p/what-are-the-data-scaling-laws-for">times</a>, getting diverse and useful data to feed a larger model has a lot of challenges. Third, robots need to carry their own battery packs, and so adding a larger GPU to run larger models introduces runtime and thermal management concerns.</p><p>On the other hand, the architecture (albeit with many details glossed over) seems to be consistent across many tasks and does not require too many architectural decisions to be made or parameters to be tuned (except for training metaparameters). It also allows for a myriad of types of demonstrations to be stood up quickly for garnering buy-in and support from customers or investors, which is a significant benefit. This is understandably a parallel of some of the observations that led to the scaling-based improvements of LLMs.</p><h3>World models in robotics</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xry4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xry4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Xry4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Xry4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Xry4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xry4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png" width="480" height="320.1098901098901" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:480,&quot;bytes&quot;:629810,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/184309659?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Xry4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Xry4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Xry4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Xry4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce84c4d-9a7c-46ea-b3c6-ed6b05380fa5_1536x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Generated by ChatGPT</figcaption></figure></div><p>First, we must observe that the post-training process described above will typically use simulation environments (with simulated physics) for the training process. Despite having the appearance of being model free, the properties of the simulator (which itself uses physics models) are implicitly embedded into the learned policy.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> The 2025 DeepMind paper referenced above suggests that it may be possible to prove that this implicitly captured information can be used to extract an explicit physics model after training.</p><p>So, does that mean we should put world models out of mind and learn an implicit one as needed (or not)? Well, this is a very inefficient way to learn physics equations and parameters: Euler-Lagrange equations, and classical <a href="https://en.wikipedia.org/wiki/System_identification">system ID</a> or <a href="https://en.wikipedia.org/wiki/Adaptive_control">adaptive control</a> methods may be able to capture the same model much more easily and in a way that is more easily generalizable. RL in general can require a large amount of training for the results they produce because <a href="https://itcanthink.substack.com/p/the-limits-of-reinforcement-learning">rewards are typically sparse</a>. In other words, an RL-trained policy possesses &#8220;<a href="https://garymarcus.substack.com/p/generative-ais-crippling-and-widespread">a lot of knowledge, and in some ways far more than most, if not all, humans</a>,&#8221; to have human-level performance on specific task. Of course, humans rely on enormous evolutionary and developmental pretraining, encoding which into specialized structures is exactly the pro-world-models argument in the debate.</p><p>In terms of the models themselves, for tasks like locomotion, Newtonian physics is very well understood, and roboticists have been building on it to <a href="https://inria.hal.science/hal-02487855/file/Chapter.pdf">develop and use models like ZMP, LIP for decades</a>. For more abstract control systems, the concept of a &#8220;plant model&#8221; in control theory is not dissimilar to the abstract state-prediction models referred to in the section above.</p><p>Some classical methods to utilize these models are to use trajectory optimization subject to the model, model-predictive control, etc. These can impose constraints on future states, so that within the bounds of the model&#8217;s accuracy, some aspects of safety can be encoded in way that isn&#8217;t possible otherwise.</p><h2>How to compare the two approaches</h2><p>Now that we can recognize the &#8220;world model&#8221; debate in applications for informational and physical AI, it&#8217;s helpful to (in rough, broad strokes) know how to compare the two strategies from a number of perspectives:</p><ol><li><p><strong>Performance: </strong>Can the method produce results that are compelling? There are umpteen benchmarks to compare language models. The next generation of world-model-equipped LLMs aren&#8217;t here yet, so we&#8217;ll wait to wait a little while to see how they stack up. There aren&#8217;t robotics benchmarks of the sort yet, though some <a href="https://generalrobots.substack.com/p/benjies-humanoid-olympic-games">informal efforts are underway</a>.</p></li><li><p><strong>Scalability and time-to-market: </strong>This is a huge advantage of scaling a simple architectures. Deep neural networks with consistently-repeating matrix multiplication and reduction primitives have been able to be mapped to SIMT processors like GPUs and systolic array processors (NPUs, TPUs) with incredible performance gains. At the moment there is not even enough information about non-trivial architectures to consider mapping them to computational hardware. It is also possible that world models can be mapped into the existing computational frameworks (and we can assume that the first generation of them will have to do so to compete). Eventually, if the computations are quite different, modified paradigms and accelerators may be needed, and scaling those may require more care and thought than the straightforward process we have followed for scaling LLMs. Based on the current state of language models and humanoid robotics as recapped above, it is clearly easier to get initial proofs-of-concept working with model-free approaches scaling a simple architecture.</p></li><li><p><strong>Computational efficiency: </strong>Newton&#8217;s equations descibe motion of bodies with very few parameters in great generality, and it is impractical to capture them with a &#8220;transformer-like&#8221; structure without significantly higher number of parameters. This is especially true where equations are discontinuous, which happens in robotics problems like locomotion and manipulation. AI is currently up against the so-called &#8220;<a href="https://arxiv.org/abs/2403.14123">memory wall</a>&#8221; due to the fact that these models need to be so large, and the most recent innovations and <a href="https://groq.com/newsroom/groq-and-nvidia-enter-non-exclusive-inference-technology-licensing-agreement-to-accelerate-ai-inference-at-global-scale">movements</a> in ML accelerators have been to do with addressing it. Utilizing appropriate models with differently-architected communications may completely sidestep this memory wall, as well as drastically improve the efficiency<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> of equivalent computations.</p></li><li><p><strong>Generalization:</strong> It should be clear that some of the models that need to be learned for robot motion have very applicable and general models that have been known for centuries, and the same holds for biologists, cognitive scientists, and psychologists in their fields. Ilya Sutskever, one of the architects of the current LLM era, <a href="https://www.dwarkesh.com/p/ilya-sutskever-2">says</a> that their structure is weak at generalization and that generalizing in the way that humans can needs new architectures. The aforementioned DeepMind paper also cites domain adaptation and generalization to unseen tasks as something that could be improved by using world models.</p></li><li><p><strong>Safety:</strong> We&#8217;ve discussed hallucinations in this post already, and the aforementioned Quattrociocchi paper makes an argument about the reliability of results from LLMs. The point of concern is how the system will react to unseen circumstances and whether it can extrapolate in reasonable ways. It may be especially important to have mechanisms for guaranteeing the possible range of actions the robot can take and explaining its decisions.</p></li></ol><p>I didn&#8217;t feel like there is sufficient information to score the approaches yet, but it is clear that model-based approaches may offer advantages in generalization and interpretability, while model-free scaling currently dominates in deployment speed and tooling maturity.</p><h2>Closing thoughts</h2><p>Before closing out this article, I must point out that this &#8220;divide&#8221; is really a spectrum&#8212;there is likely a rich space of hybrids of the two approaches, which may consist of hierarchical structures combining the strengths of each. Deep learning excels at parsing and summarization of text and images, automatically finding the most appropriate dimensional reduction techniques. World models, when coupled with methods that know how to use them, are strong at generalization, abstraction, and can produce very computationally-efficient algorithms.</p><p>In future posts, I plan to write about any new developments on the informational or physical sides that are demonstrating usage and adoption of world models, or of new hybrid architectures. I will also be plan to write some posts where I construct simple scenarios to fairly evaluate competing architectures along the different metrics above. Last but not least, I will plan to go into more details on computational hardware acceleration of non-trivial architectures.</p><p>I believe this is going to be an ongoing recurring topic in this publication, so make sure to subscribe and share if you found this interesting.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/p/the-ai-world-models-debate-and-its?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.avikde.me/p/the-ai-world-models-debate-and-its?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>The underlying transformer has seen performance-related tweaks such as GQA, and more recent &#8220;mixture of experts&#8221; models create a bit of a tree-like structure by combining different models. Also, it <a href="https://x.com/fchollet/status/1802785277758591054">can be argued</a> that tool and code interpreter usage by LLMs constitute a neurosymbolic architecture. However, it is fair to say that all these tweaks don&#8217;t represent the headlining scaling strategy for leading AI companies.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>In this context, I believe the post-training component is likely to be reinforcement learning (RL) in simulation&#8212;a <a href="https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback">similar approach was used to train post-train early LLMs</a>, and is now enhanced with a multistep process, though pure RL is still used in some applications.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>In fact, &#8220;differentiable simulators&#8221; are increasingly used in RL to allow gradient-reliant training algorithms to work more easily. This is an interesting topic that we will explore more deeply in a future post, so stay subscribed for that.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>The energetic cost of DRAM access is <a href="https://mlsysbook.ai/book/contents/core/hw_acceleration/hw_acceleration.html">orders of magnitude higher</a> than a multiply-accumulate operation. Systolic architectures require fewer accesses to multiply a whole matrix than conventional scalar architectures, but with the architecture being equal, fewer weights and smaller models would undeniably reduce computational energetic cost.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Model-predictive control of RoboBee flapping flight]]></title><description><![CDATA[Hierarchical model-predictive and data-driven control method published in IJRR (2022)]]></description><link>https://www.avikde.me/p/model-predictive-control-of-robobee</link><guid isPermaLink="false">https://www.avikde.me/p/model-predictive-control-of-robobee</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Wed, 24 Dec 2025 16:11:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/RV9CJE_unHk" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this post, we&#8217;ll go over a method to control the flight of a <a href="https://wyss.harvard.edu/technology/robobees-autonomous-flying-microrobots/">RoboBee</a> in a way that should be approachable for a broad audience. In keeping with this publication&#8217;s focus on energy-efficient robotics, this method was designed to run on extremely low-power computational hardware, as we will see.</p><p>Just to provide brief context,</p><ul><li><p>the RoboBee hardware was at this point fairly mature, and on the &#8220;<a href="https://www.researchgate.net/publication/261354075_Design_Fabrication_and_Modeling_of_the_Split_Actuator_Microrobotic_Bee">Split Dual-Actuator Bee</a>&#8221; generation;</p></li><li><p>the state-of-the-art flight controller was a capable, but task-specific <a href="https://seas.harvard.edu/news/2013/05/robotic-insects-make-first-controlled-flight">hovering controller</a> with limited generalizability.</p></li></ul><p>The goal for this project was to develop a controller that could be easily generalized to more complex tasks using modern control methods. The resulting paper<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> with <a href="https://www1.villanova.edu/university/engineering/faculty-research/sports-and-performance/Faculty-Researchers/biodetail.html?mail=rebecca.mcgill@villanova.edu&amp;xsl=bio_long">Dr Rebecca McGill</a> made some demonstrable advances in terms of better operation away from an upright configuration, the ability to stabilize tasks like following a desired path or executing more dynamic behaviors like perching and flipping, as well as robustness to suboptimal gain tuning and manufacturing variability.</p><p>Here are some hovering clips (short 32s video; no audio):</p><div id="youtube2-RV9CJE_unHk" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;RV9CJE_unHk&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/RV9CJE_unHk?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>The remainder of this post explains how this result was achieved, and potential future extensions of the idea.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power}! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Background</h2><h3>RoboBee and flapping flight</h3><p>The RoboBee is a 100mg flapping robot <a href="https://www.harvardmagazine.com/science-technology/harvard-robot-bees-future-robotic-engineering">developed by Dr. Rob Wood</a>, capable of hovering and controlled flight. To put it in perspective, a US nickel weights 5g or the equivalent of 50x RoboBees. Having spent a lot of time in the Harvard microrobotics lab fabricating them, it is no exaggeration to say that a sneeze can literally destroy weeks of work.</p><p>Along with a family of similarly-fabricated robotic systems developed at the Harvard microrobotics lab, they are actuated by piezoelectric bending actuators. The piezoelectric effect is commonly seen in the working of microphones, which convert vibrations created by acoustic pressure waves into electric signals. They also do that in reverse, converting electric pulses into vibratory motion. The RoboBee uses piezoelectric bending actuators, constructed similarly to a bimetallic strip, converting slight expansion and contraction of the piezoelectric material into a bending motion.</p><p>Generally, the piezoelectric actuators produce very small motions that need to be amplified to produce the requisite aerodynamic work. After the conversion to the bending motion, they also go through another transmission that converts the small bending translational motion into a rotational motion. In a previous post, I went into the details of how this transmission works, and a project I worked on to optimize it.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;8765e2d3-bccc-43a6-8afe-49948f2ba8a5&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Using models to design a RoboBee&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Writing about safe, efficient AI -- Robotics Ph.D. and founder&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-12-22T00:00:00.000Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!_Aa4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.avikde.me/p/template-based-design-robobee&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:182198523,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>OK, now we are at the stage of converting electrical signals into rotational motion of the wing. The wing itself is attached to the end of the transmission via a passive hinge, so that when the base of the wing is flapped, it not only flaps, but also pivots about its hinge, thereby actively changing its pitch, or angle-of-attack. This motion is common among flapping animals:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N2Xl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N2Xl!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 424w, https://substackcdn.com/image/fetch/$s_!N2Xl!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 848w, https://substackcdn.com/image/fetch/$s_!N2Xl!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 1272w, https://substackcdn.com/image/fetch/$s_!N2Xl!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N2Xl!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif" width="480" height="270" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:270,&quot;width&quot;:480,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N2Xl!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 424w, https://substackcdn.com/image/fetch/$s_!N2Xl!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 848w, https://substackcdn.com/image/fetch/$s_!N2Xl!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 1272w, https://substackcdn.com/image/fetch/$s_!N2Xl!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2409ce6-c2e1-4ff0-83e5-f694a94fa5c6_480x270.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Hummingbird hovering showing changing wing pitch over flapping cycle. <a href="https://www.youtube.com/watch?v=RtUQ_pz5wlo">Source: NatGeoWild</a></figcaption></figure></div><p>RoboBee&#8217;s clever design allows the wing pitch to change passively as the wing flaps, i.e. only one actuator is needed per wing to obtain something resembling the complex wing motion of the hummingbird above:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bM5H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bM5H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 424w, https://substackcdn.com/image/fetch/$s_!bM5H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 848w, https://substackcdn.com/image/fetch/$s_!bM5H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 1272w, https://substackcdn.com/image/fetch/$s_!bM5H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bM5H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png" width="484" height="365.06484641638224" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:663,&quot;width&quot;:879,&quot;resizeWidth&quot;:484,&quot;bytes&quot;:79213,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/182263726?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bM5H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 424w, https://substackcdn.com/image/fetch/$s_!bM5H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 848w, https://substackcdn.com/image/fetch/$s_!bM5H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 1272w, https://substackcdn.com/image/fetch/$s_!bM5H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa18ada71-11fe-44f9-84ed-4e893a6c55eb_879x663.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A schematic showing the construction of a &#8220;half-RoboBee,&#8221; where the piezoelectric bending actuator, transmission, and both wing joints can be seen. Figure from <a href="https://scholar.google.com/citations?view_op=view_citation&amp;hl=en&amp;user=m-A4ZdEAAAAJ&amp;sortby=pubdate&amp;citation_for_view=m-A4ZdEAAAAJ:ODE9OILHJdcC">this paper</a>.</figcaption></figure></div><h3>Modeling RoboBee&#8217;s flight</h3><p>A model of the motion produced is very important to understand how to use the available wing input signals to get to a desired goal. There is a large debate between model-based vs. model-free methods (which eschew models in gathering a lot of data with the black box system and approximating its behavior). Increasing computational power recently has resulted in increased temptation to abandon models, though in many sim2real reinforcement learning approaches, models are used in developing the simulation.</p><p>In the case of RoboBee, the difficulty with pursuing a fully model-based method is that aerodynamics is quite difficult to model. Nonetheless, some work in the early 2010&#8217;s on <a href="https://en.wikipedia.org/wiki/Blade_element_theory">blade-element modeling</a> has proved quite useful for understanding the relation of RoboBee wing motion to the produced lift and drag forces. Using that model, we developed a RoboBee simulator, which is open-sourced<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. We will discuss the software supporting this work further below, but here is an animation of some fixed control inputs (similar to the 2013 flight control work) producing simulated flapping flight, complete with passive wing pitching (short 17s video; no audio):</p><div id="youtube2-Qm0_yIEXycU" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Qm0_yIEXycU&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Qm0_yIEXycU?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>The disadvantage of the model above that it is very complex and not possible to use to directly develop a controller. However, the other components of RoboBee dynamics (excluding how the wing produces lift and drag) are well-explained by Newtonian physics. In this latter area, there is a great degree of similarity to the control of legged robots.</p><p>Typically the world of flapping flight and legged control do not overlap, but there are a number of similarities that motivate the use of similar methods. They are both</p><ul><li><p><strong>cyclic</strong> (though in the RoboBee case, the wings are assumed massless and flap so fast that their dynamics are considered decoupled from the body);</p></li><li><p><strong>mechanics-dominated</strong> (it is very important to consider the physics of ground interactions and aerodynamics); and</p></li><li><p><strong>underactuated</strong> (we don&#8217;t have enough actuators to fully stabilize the motion, and typically in these scenarios some amount of &#8220;lookahead planning&#8221; is required).</p></li></ul><p>In the legged robotics field, there is a long tradition of using simplified models to aid in control development (so-called &#8220;spring-mass&#8221; models), as I have <a href="https://www.avikde.me/p/jerboa-hopping-video">discussed</a> <a href="https://www.avikde.me/p/vertical-hopper-compositions">before</a>. In this paper, we introduce for the first time an equivalent for RoboBee-like flapping flight.</p><h3>Model-predictive control (MPC)</h3><p>As discussed above, in underactuated scenarios, it is typically the case that some knowledge about the future behavior of the system can be predicted in order to decide which inputs to supply. As a simple example, how should the cart be moved in order to get the attached pole to swing up?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6tS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6tS3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 424w, https://substackcdn.com/image/fetch/$s_!6tS3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 848w, https://substackcdn.com/image/fetch/$s_!6tS3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 1272w, https://substackcdn.com/image/fetch/$s_!6tS3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6tS3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif" width="562" height="421.20421052631576" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:712,&quot;width&quot;:950,&quot;resizeWidth&quot;:562,&quot;bytes&quot;:435249,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/182263726?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6tS3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 424w, https://substackcdn.com/image/fetch/$s_!6tS3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 848w, https://substackcdn.com/image/fetch/$s_!6tS3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 1272w, https://substackcdn.com/image/fetch/$s_!6tS3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e3ca23e-e11b-47ab-a8d3-87265f13507c_950x712.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Animation borrowed from <a href="https://commons.wikimedia.org/wiki/File:Cart-pole_swing_up.gif">here</a> per the <a href="https://en.wikipedia.org/wiki/en:Creative_Commons">Creative Commons</a> <a href="https://creativecommons.org/licenses/by-sa/4.0/deed.en">Attribution-Share Alike 4.0 International</a> license.</figcaption></figure></div><p>A global understanding of the future behavior of the system can be summarized in a so-called <a href="https://en.wikipedia.org/wiki/Value_function">value function</a>, and knowing this function can tell us exactly which way we should move to get to our goal from all states.</p><p>The problem is, the value function is not &#8220;known.&#8221; It can be estimated by exhaustively poking and prodding the system (which is an approach that resembles reinforcement learning). However, when we know of a dynamical model for the system, it is sensible to use it, because it greatly reduces the dimensionality of the control system to treat the dynamics as fixed.</p><p>Model-predictive control (MPC) tries to create a small local approximation of the value function <em>online</em> by using the future state of the system over a short prediction horizon (subject to a model) as a proxy for the value of the current state. MPC is now an old technique, but widely used in industrial process automation, aerospace, etc.</p><h2>Approach: model-based MPC and model-free inverse dynamics</h2><p>Here is the overall plan:</p><ol><li><p>Develop a simplified model capturing the desired behavior: for this step, I noted that we do not care about the heading, but instead simply that the robot stays upright.</p></li><li><p>&#8220;Anchor&#8221; the behavior on to the RoboBee: convert to signals that get sent to the acuators.</p></li></ol><p>The system architecture figure below makes this explicit. The purple &#8220;flying brick&#8221; is the model, whose future states we can predict for known inputs. The MPC can then effectively back out the best inputs <em>for that model</em> to get to a desired state. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J6jd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J6jd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 424w, https://substackcdn.com/image/fetch/$s_!J6jd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 848w, https://substackcdn.com/image/fetch/$s_!J6jd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!J6jd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J6jd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg" width="1456" height="1012" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1012,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 1&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 1" title="Figure 1" srcset="https://substackcdn.com/image/fetch/$s_!J6jd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 424w, https://substackcdn.com/image/fetch/$s_!J6jd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 848w, https://substackcdn.com/image/fetch/$s_!J6jd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!J6jd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93af0db4-34fd-4801-9736-4908d453141c_1704x1184.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">System architecture</figcaption></figure></div><p>However, as the blue and green arrows in the figure show, that only partially solves our problem because the RoboBee is not a flying brick. To address the gap, we need to define the operations undertaken by the arrows:</p><ul><li><p><strong>State projection (blue arrow): </strong>This process is relatively simple for this instance, because the state of the flying brick is effectively a subset of the state of the actual RoboBee. It has an elevation and body tilt angles just like the RoboBee, and we simply project the coordinates to those of the flying brick.</p></li><li><p><strong>Inverse dynamics (green arrow): </strong>The other direction is more complex&#8212;in essence, we want to go from the abstracted thrust/roll/pitch torque inputs for the brick &#8594; RoboBee wing actuator signals. This process is complex because of a couple of reasons:</p><ul><li><p>The mapping is much more complex than any kind of projection; for the various components:</p><ul><li><p>Wing voltage &#8594; actuator motion (depends on piezoelectric actuator electrical and mechanical properties)</p></li><li><p>Actuator motion &#8594; wing base motion (depends on transmission and its stiffness)</p></li><li><p>Wing base motion &#8594; wing motion (depends on hinge and wing mechanical properties)</p></li><li><p>Wing motion &#8594; reaction forces and torques (depends on wing aerodynamic interactions, ground effect, etc.)</p></li></ul></li><li><p>Manufacturing variability makes this mapping inexact (if you manufactured two RoboBees, they may require different wing signals to produce the same wing motion)</p></li></ul></li></ul><p>For these reasons, models have limited utility for the green arrows, and so, the paper proposed a model-free method for that part.</p><h2>Model-based MPC</h2><h3>Template: upright rigid body</h3><p>First, we need to pick the model. As the saying goes, all models are wrong, but the goal here is to capture the most important parts of the dynamics, and the objective.</p><p>The RoboBee&#8217;s wings are very light, and so most of its mass is truly contained in its body (more on this below). Dynamically, this is well-approximated by the flying brick, with no other moving parts.</p><p>To capture the objective, we note that we do not particularly care about the heading of the RoboBee when we just want it to hover, or fly controllably. This allows us to effectively remove one degree of freedom from our specification of the objective, and capture the state of the flying brick with:</p><ul><li><p>To capture the position, we use the <em>(x, y, z)</em> Cartesian coordinates of the center of mass as expected.</p></li><li><p>To capture the orientation, we only look at the components of the &#8220;upright vector&#8221; (a vector pointing up in the body frame). Note that an objective of hovering can be simply stated as the desire to have the upright vector point vertically up.</p></li></ul><h3>Waypoint tracking MPC</h3><p>We write the dynamical equations for the flying brick using the Newton-Euler equations for the motion of a rigid body. After a small approximation as described in the paper, we get</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\ddot p = s T - g e_3, ~~\n\\ddot s = -\\hat s B \\tau,&quot;,&quot;id&quot;:&quot;FBUJCMUYRK&quot;}" data-component-name="LatexBlockToDOM"></div><p>where <em>p</em> represents the Cartesian position, <em>s</em> represents the upright vector, <em>T</em> represents the (scalar) upward thrust, and &#964; represents the (2-dimensional) roll, pitch torque vector.</p><p>These equations are quite simple, owing mainly to the fact that the wings are quite light, and so their flapping does not significantly impact the motion of the much more massive body. This concept is also utilized in many legged running robots, referred to there as &#8220;<a href="https://underactuated.mit.edu/humanoids.html">massless legs</a>.&#8221; It&#8217;s worth taking a minute to appreciate the significance of this: in practice, human limbs are not massless, which allows (for example) a gymnast to adjust their body orientation while flying through the air by controllably moving their limbs and landing a flip. However, mastering that kind of control is much more difficult than the massless legs (or wings) paradigm, where we can safely make the assumptions that the appendages simply produce a force or torque that acts on the body. A helpful picture to have in mind is that in the massless appendage paradigm, we can substitute the appendages for thrusters attached at appendage base, and pretend we are controlling the thrust vector instead.</p><p>Upon further inspection, the equations are second-order (as expected for any mechanical system). The orientation equation is also unfortunately nonlinear, as can be seen from the product of <em>s, T</em>,  and &#964; appearing on the right side. This is also normal for such systems, but adds a challenge to our MPC transcription.</p><p>To resolve this difficulty, we <em>linearize</em> these dynamics at the current orientation and thrust <em>(s<sub>0</sub>, T<sub>0</sub>) </em>before incorporating them into the model-predictive controller. The controller will reason about the best inputs based on how they act on the current state, which intuitively is fine for a short enough planning horizon.</p><p>As an analogy, a car driver on the highway will turn their steering wheel slightly to change lanes (an action that is appropriate for a planning horizon for a few seconds), even though that action would not be appropriate on a long enough horizon that they drive off the highway. Similar to the car driver, the RoboBee in this scenario will re-evaluate its inputs with a new state soon enough. MPC always works this way, with a finite planning horizon a short duration from the current time.</p><p>The objective for the MPC is to track a trajectory of future states, including a position and velocity. For example, to hover, the desired position is the hovering goal position, and the desired velocity is zero. To follow a particular path in space, that path can be discretized and substituted into the desired positions.</p><h3>Simulation evaluation</h3><p>To evaluate if the MPC with the linearized dynamics above works appropriately, we can compare the performance of the controller in a number of simple tasks in an apples-to-apples comparison with the prior state-of-the-art reactive controller.</p><h4>Hovering, trajectory following</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XFiy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XFiy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XFiy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XFiy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XFiy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XFiy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg" width="1456" height="655" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:655,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 4&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 4" title="Figure 4" srcset="https://substackcdn.com/image/fetch/$s_!XFiy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XFiy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XFiy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XFiy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc969caf9-8032-478f-8c0a-a713bf3a7d67_3976x1790.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Simulation evaluation of MPC vs. reactive on the upright model.</figcaption></figure></div><p>The tasks, as shown above, were:</p><ul><li><p>Hover task starting off withan initial orientation with roll&#8203; and pitch angles set to 0.5 rad, -0.5 rad, and initial velocity 0.1 m/s in the <em>x</em>-direction&#8203;</p></li><li><p>Waypoint tracking on an&#8203; &#8220;S&#8221;-shaped trajectory in the <em>xz</em>-plane.&#8203;</p></li><li><p>Tracking a commanded velocity of 2m/s for 0.5 seconds before stopping.</p></li></ul><p>In each of these scenarios, the MPC performs better than the reactive controller (notes on tuning below), which is promising.</p><h4>Perching, flipping</h4><p>Specifying a task in terms of a reference trajectory can be onerous, for example, if we want the bee to do a backflip, it isn&#8217;t clear what sequences of positions and velocities are appropriate for the horizon.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> To test the robustness of the MPC, here we feed it &#8220;made up&#8221; infeasible trajectories and see how well it can track them.</p><p>The tasks we choose to test include the aforementioned flip, and a wall-perching behavior inspired by this past research:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YwQT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YwQT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 424w, https://substackcdn.com/image/fetch/$s_!YwQT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 848w, https://substackcdn.com/image/fetch/$s_!YwQT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 1272w, https://substackcdn.com/image/fetch/$s_!YwQT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YwQT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png" width="551" height="177" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:177,&quot;width&quot;:551,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YwQT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 424w, https://substackcdn.com/image/fetch/$s_!YwQT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 848w, https://substackcdn.com/image/fetch/$s_!YwQT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 1272w, https://substackcdn.com/image/fetch/$s_!YwQT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2236d96d-ac96-430f-b214-e40a35889ac5_551x177.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">A perching task from <a href="https://www.science.org/doi/abs/10.1126/science.aaf1092">this paper</a> from 2016.</figcaption></figure></div><p>The reference trajectories are selected intentionally naively:</p><ul><li><p>For the perch task, the desired position translates smoothly to the right, and the desired orientation steadily rotates to 90 degrees at the end of the motion</p></li><li><p>For the flip task, the desired position is fixed, and the desired orientation smoothly rotates 360 degrees.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xi_9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xi_9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Xi_9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Xi_9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Xi_9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xi_9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg" width="1456" height="422" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:422,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 5&quot;,&quot;title&quot;:&quot;Figure 5&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 5" title="Figure 5" srcset="https://substackcdn.com/image/fetch/$s_!Xi_9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Xi_9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Xi_9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Xi_9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F981ff099-d657-4a4c-bc0d-4f5504b8f382_3976x1152.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Simulation evaluation of perch and flip behaviors.</figcaption></figure></div><p>The results show that the MPC is able to compensate for the naivet&#233; of the reference trajectories to accomplish the task to satisfaction. The reactive hover controller cannot solve these tasks.</p><h4>A note on tuning the controllers</h4><p>Something that most research papers will sweep under the rug is the process of how the controllers were tuned. The previous state-of-the-art reactive controller has hand-tuned PD gains, and the MPC has weights on the objective. To make a fair comparison, we have to tune both as best as possible.</p><p>In general, there is a tradeoff between tracking error and tracking effort. As an analogy, cruise control in cars often have an eco mode, where they may deviate from the speed setpoint a bit more, but waste less fuel. Similarly, you can spend less actuator effort in exchange for tracking the goal a little less precisely. This is usually one of the ways in which controllers are tuned in practive.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L-j6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L-j6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 424w, https://substackcdn.com/image/fetch/$s_!L-j6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 848w, https://substackcdn.com/image/fetch/$s_!L-j6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!L-j6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L-j6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg" width="1456" height="753" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:753,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 7&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 7" title="Figure 7" srcset="https://substackcdn.com/image/fetch/$s_!L-j6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 424w, https://substackcdn.com/image/fetch/$s_!L-j6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 848w, https://substackcdn.com/image/fetch/$s_!L-j6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!L-j6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe510a6bb-0ee8-4b44-8849-e1540afa9002_2980x1542.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><strong>Left: </strong>The MPC can attain low tracking error with a broad swath of weight magnitudes; <strong>right:</strong> comparing the MPC and reactive controller tuning.&#8203;</figcaption></figure></div><p>The plot on the right shows the MPC and the reactive controllers fairly compared with a variety of tuning gains, showing that the MPC is significantly easier to tune, and can track better with lower actuator effort than is possible with the reactive controller.</p><h2>Data-driven inverse dynamics</h2><p>As we discussed above, the mapping from actuator signal &#8594; produced force/torque is unknown/uncertain due to the system complexity and manufacturing variability.</p><p>An example of a common type of manufacturing variability is that some RoboBee transmissions just exhibit higher stiffness than others. If the left wing has a stiffer transmission than the right wing, the left wing may flap with a smaller wing amplitude than the right one when driven equivalently, and produce much less lift force.</p><p>In this project we took the approach of breaking down the components of this mapping, and just using data to characterize the variable parts. This meant collecting data of wing kinematics as a function of actuator signals and then fitting a function to approximate some &#8220;kinematics features&#8221; that could be expected for each actuator signal:&#8203;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XgSF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XgSF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XgSF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XgSF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XgSF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XgSF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg" width="1456" height="1035" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1035,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 3&quot;,&quot;title&quot;:&quot;Figure 3&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 3" title="Figure 3" srcset="https://substackcdn.com/image/fetch/$s_!XgSF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XgSF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XgSF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XgSF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bafc7eb-ffda-488d-8a88-a99baa50c508_2502x1778.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Kinematics features measured had to do with the win flap up and down stroke amplitudes, and the attained wing pitch.</figcaption></figure></div><p>We then used the blade-element model to predict the reaction force/torque from the wing kinematics.</p><p>To show the effect of this kind of mapping, we performed the same operation in the RoboBee simulator, and simulated the effect of adding a force bias of 3 mN to one of the actuators. With no force bias, the data-driven mapping and the manually-tuned mapping both work, but with the force bias, the data-driven mapping can still work while the manually-tuned mapping fails.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dA35!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dA35!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 424w, https://substackcdn.com/image/fetch/$s_!dA35!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 848w, https://substackcdn.com/image/fetch/$s_!dA35!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 1272w, https://substackcdn.com/image/fetch/$s_!dA35!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dA35!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805" width="896" height="805" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/05c9395d-501c-477c-ab26-7157c2ad65c2_896x805&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:805,&quot;width&quot;:896,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dA35!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 424w, https://substackcdn.com/image/fetch/$s_!dA35!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 848w, https://substackcdn.com/image/fetch/$s_!dA35!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 1272w, https://substackcdn.com/image/fetch/$s_!dA35!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c9395d-501c-477c-ab26-7157c2ad65c2_896x805 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Comparison of data-driven (WLQP) inverse dynamics to manually tuned mapping.</figcaption></figure></div><h2>Hardware integration</h2><h3>Setup</h3><p>Encouraged by the simulation results, we pushed ahead to integrate the MPC into the physical RoboBee control system, which looks as follows:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lQGk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lQGk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 424w, https://substackcdn.com/image/fetch/$s_!lQGk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 848w, https://substackcdn.com/image/fetch/$s_!lQGk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!lQGk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lQGk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg" width="1456" height="611" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:611,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 10&quot;,&quot;title&quot;:&quot;Figure 10&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 10" title="Figure 10" srcset="https://substackcdn.com/image/fetch/$s_!lQGk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 424w, https://substackcdn.com/image/fetch/$s_!lQGk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 848w, https://substackcdn.com/image/fetch/$s_!lQGk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!lQGk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b2ba8d4-3b57-4330-99b5-79c59cdd979e_3796x1592.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">System architecture for RoboBee flight experiments and the actual experimental setup. The tether becomes slack during flight.</figcaption></figure></div><p>The actuators were connected to a <a href="https://www.mathworks.com/products/simulink-real-time.html">Simulink real-time</a> control PC, which was new to me. The setup encourages code to mostly be compiled from graphical blocks such as filters, delays, etc., but does allow for custom blocks written as MATLAB functions. While the state estimator and some other components were in fact MATLAB functions, we implemented the MPC in C using <a href="https://osqp.org/">OSQP</a>, as part of a more forward-looking architecture that could also run onboard the RoboBee on a microcontroller.</p><p>When run from the Simulink target PC, the iteration frequency was 5KHz for everything, locked together due to the Simulink architecture. The MPC itself also ran at 100-200Hz on small STM32G4 MCU&#8203; that fell within the 25mg payload constraints of the RoboBee. We tested that the controller could successfully stabilize the simulator when run at rates of 100Hz.</p><h3>Experimental results for hovering</h3><p>A video clip of some of the hovering results were linked to in the introduction of this post. Some overlaid trajectories from those trials are shown in the figure below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FX6O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FX6O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 424w, https://substackcdn.com/image/fetch/$s_!FX6O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 848w, https://substackcdn.com/image/fetch/$s_!FX6O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 1272w, https://substackcdn.com/image/fetch/$s_!FX6O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FX6O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png" width="480" height="613.3333333333334" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1426,&quot;width&quot;:1116,&quot;resizeWidth&quot;:480,&quot;bytes&quot;:564795,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.avikde.me/i/182263726?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FX6O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 424w, https://substackcdn.com/image/fetch/$s_!FX6O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 848w, https://substackcdn.com/image/fetch/$s_!FX6O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 1272w, https://substackcdn.com/image/fetch/$s_!FX6O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4204f975-031f-4e20-8921-a701ce7ee376_1116x1426.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Each trial ended due to the motion capture system losing track of the RoboBee, or by a command we sent. We were able to keep the orientation stabilized in each trial, though the horizontal position drifted more than desired.</p><p>The hovering task was overall a good demonstration of the feasibility of integrating this much more advanced controller paradigm into the RoboBee.</p><p>In the future, it would be very exciting to see either or both:</p><ul><li><p>some of the tasks we tested (and compared to the reactive controller) in the simulation section running on the RoboBee</p></li><li><p>the controller running on a microcontroller, along with onboard sensing and power, for fully untethered complex flight</p></li></ul><h2>Implementation details and replicating results</h2><p>In the interests of open science, the code for various parts of this project are all <a href="https://github.com/avikde/robobee3d">online</a>. While I don&#8217;t have continued access to the Simulink software and experimental setup, if you need support, please comment below&#8212;continued progress and replicability are well worth the support and debugging.</p><h3>MPC</h3><p>This is implemented as a quadratic program with OSQP.</p><ul><li><p>The quadratic program is defined in <a href="https://github.com/avikde/robobee3d/blob/master/template/genqp.py">genqp.py</a>. When that file is run as a script, it instantiates the controller and runs a test, or in the commented-out section at the bottom, run&#8217;s <a href="https://osqp.org/docs/codegen/index.html">OSQP&#8217;s codegen</a> feature to generate a standalone set of C files that can solve the QP. The codegen output is stored in the <a href="https://github.com/avikde/robobee3d/tree/master/template/uprightmpc2">uprightmpc2</a> directory (though it can be regenerated as well).</p></li><li><p>The codegen outputs define the structure of the problem, but the variables need to be <a href="https://osqp.org/docs/examples/update-matrices.html">updated</a> as the current state of the RoboBee or the reference trajectory changes. To do this, the <a href="https://github.com/avikde/robobee3d/blob/master/template/uprightmpc2/uprightmpc2.h">uprightmpc2.h</a> file provides some simple interfaces with named parameters that can be called. The C file of the same name contains its implementation.</p></li><li><p>The C code in the uprightmpc2 file can be built using CMake; something like</p></li></ul><pre><code>cd uprightmpc2
mkdir -p build &amp;&amp; cd build
cmake ..</code></pre><h3>Simulations</h3><ul><li><p>The simulations testing the MPC with the upright template model can be run from the <a href="https://github.com/avikde/robobee3d/tree/master/template">template</a> directory.</p></li></ul><ul><li><p>The <a href="https://github.com/avikde/robobee3d/blob/master/template/uprightmpc2.py">uprightmpc2.py</a> file should recreate the test scenarios covered in plots above and in the paper when run as a script. The bottom of the file contains code describing the test scenarios that can be uncommented.</p></li><li><p>The 3D pybullet simulation can be run by executing the <a href="https://github.com/avikde/robobee3d/blob/master/template/robobee.py">robobee.py</a> script.</p></li></ul><h3>Simulink setup</h3><ul><li><p>The C code is integrated into the Simulink real-time setup as an <a href="https://www.mathworks.com/help/simulink/sfg/what-is-an-s-function.html">S-function</a>; the legacy_code_gen.m file configures the inputs and outputs of the block that will appear in Simulink. See <a href="https://www.mathworks.com/help/simulink/sfg/integrating-existing-c-functions-into-simulink-models-with-the-legacy-code-tool.html">this page</a> for more guidance on this process, which was quite tricky.</p></li><li><p>The simulink model files are slx files, and can be found <a href="https://github.com/avikde/robobee3d/tree/master/template/matlab">here</a>.</p></li></ul><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><a href="https://journals.sagepub.com/doi/pdf/10.1177/02783649211063225">An efficient, modular controller for flapping flight composing model-based and model-free components - Avik De, Rebecca McGill, Robert J Wood, 2022</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><a href="https://github.com/avikde/robobee3d">avikde/robobee3d: Robobee research including controls, modeling, and simulation</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>In practice, for these kind of tasks, it is common in the state-of-the-art to use offline optimization or learning (which takes much more computation to run) to figure out the best trajectory, and then use that reference for the MPC.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Power-efficient and safe mobile robots]]></title><description><![CDATA[Talk at OSU CoRIS seminar]]></description><link>https://www.avikde.me/p/power-efficient-safe-robots</link><guid isPermaLink="false">https://www.avikde.me/p/power-efficient-safe-robots</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Mon, 23 Dec 2024 00:00:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bxUH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I gave a <a href="https://engineering.oregonstate.edu/events/power-efficient-autonomous-mobile-robots">talk at OSU&#8217;s CoRIS seminar</a>. It was a joy to visit OSU&#8217;s Robotics department. The faculty are driven to solve problems grounded in the real world, in application areas ranging from under the sea to the peak of Mt. Hood. Also, it was only partially raining on the day of the seminar (which I found out was a rarity).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bxUH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bxUH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bxUH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bxUH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bxUH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bxUH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg" width="600" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;OSU&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="OSU" title="OSU" srcset="https://substackcdn.com/image/fetch/$s_!bxUH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bxUH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bxUH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bxUH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b242a66-55dd-4508-ac58-f91691035686_600x600.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The <a href="https://en.wikipedia.org/wiki/Pacific_Northwest">PNW</a> scenery is terrific and would be a great draw if it didn&#8217;t mostly rain from September to May.</figcaption></figure></div><h2>Modularity</h2><p>In this talk, I started a bottom-up exploration of composition in robotics.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power} by avikde! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>Dynamic legged locomotion</h3><p>As with I&#8217;m sure many others, as a young graduate student, I was inspired by the dynamic legged locomotion work at the MIT Leg Lab in the 1980&#8217;s:</p><div id="youtube2-Bd5iEke6UlE" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Bd5iEke6UlE&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Bd5iEke6UlE?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>In his thought-provoking <a href="https://mitpress.mit.edu/9780262681193/legged-robots-that-balance/">book</a>, <a href="https://en.wikipedia.org/wiki/Marc_Raibert">Raibert</a> articulated an intriguing idea called &#8220;Control of Running Decomposed into Three Parts.&#8221; Researchers have been trying to understand when and how this may be possible, and how it generalizes, since then.</p><p>My Ph.D. advisor, <a href="https://directory.seas.upenn.edu/daniel-e-koditschek/">Koditschek</a>, has been doing that for decades. In the 1990&#8217;s, his research group built and impressive array of juggling robots (as a less-power-hungry proxy for cyclic dynamical behavior):</p><div id="youtube2-u8I7EXXgTvk" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;u8I7EXXgTvk&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/u8I7EXXgTvk?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>In the course of the juggling research, they introduced a formal idea of <a href="https://deepblue.lib.umich.edu/bitstream/handle/2027.42/67990/10.1177_02783649922066385.pdf">sequential composition</a> with an intuitive but mathematically rigorous and useful idea:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z6_V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z6_V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 424w, https://substackcdn.com/image/fetch/$s_!Z6_V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 848w, https://substackcdn.com/image/fetch/$s_!Z6_V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 1272w, https://substackcdn.com/image/fetch/$s_!Z6_V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z6_V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png" width="400" height="407" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:407,&quot;width&quot;:400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Sequential Composition in IJRR '99&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sequential Composition in IJRR '99" title="Sequential Composition in IJRR '99" srcset="https://substackcdn.com/image/fetch/$s_!Z6_V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 424w, https://substackcdn.com/image/fetch/$s_!Z6_V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 848w, https://substackcdn.com/image/fetch/$s_!Z6_V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 1272w, https://substackcdn.com/image/fetch/$s_!Z6_V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780dd6e7-06fb-48b3-aeee-c5ae20d7f8d4_400x407.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The &#8220;funnels&#8221; picture of sequential composition</figcaption></figure></div><p>Analogously, we can retroactively label Raibert&#8217;s &#8220;control in three parts&#8221; idea as an example of <a href="/jerboa-hopping-video">parallel composition</a>. While the term is not extremely common in the robotics literature, similar concepts appear with names such as &#8220;decoupled control&#8221;. The idea has clearly been empirically useful, but <a href="/hybrid-averaging">formalizing it</a> has been quite tricky with any degree of generality.</p><p>Sequential and parallel composition are a very intuitive idea with equivalents in programming and spoken language. Consider the example of generating spoken language &#8211; instead of outputting the sounds corresponding to an entire sentence at once, we may want to start by assembling words from <a href="https://en.wikipedia.org/wiki/Phoneme">phonemes</a>, and assembling those into sentences. On the other hand, modern <a href="https://en.wikipedia.org/wiki/Deep_learning_speech_synthesis">deep learning speech synthesis</a> may not have any such compositional properties, which is an intentional counterpoint that we will return to.</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/ChengleiSi/status/1731047065382523332?s=20&quot;,&quot;full_text&quot;:&quot;I saw debates on whether GPT-4V can &#8220;solve&#8221; compositionality, so I spent my precious Friday afternoon benchmarking it on Winoground.\n\nTldr: NO it&#8217;s still far from solved (GPT-4V 38.0% vs PaLI 28.8% vs MTurk Humans 85.5%).\n\nColab w/ all results: <a class=\&quot;tweet-url\&quot; href=\&quot;https://tinyurl.com/winogpt4v\&quot;>tinyurl.com/winogpt4v</a> \n\n&#129525;(1/n)&quot;,&quot;username&quot;:&quot;ChengleiSi&quot;,&quot;name&quot;:&quot;CLS&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/1356609929243734018/FDzdwcv6_normal.jpg&quot;,&quot;date&quot;:&quot;2023-12-02T20:25:56.000Z&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:7,&quot;retweet_count&quot;:48,&quot;like_count&quot;:323,&quot;impression_count&quot;:115213,&quot;expanded_url&quot;:null,&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h3>Modularity elsewhere</h3><p>Deep learning did evolve from neural networks, which evoke biology right in the name. Biology has <a href="/what-are-robot-dogs">inspired many of the working principles</a> of quadrupedal robots, including behavioral modularity.</p><p>Animals have an abundance of sensory inputs and muscle, but the number of task-level variables important to any particular task is a lot smaller (<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC4121431/">Ting (2007)</a>). Going further, <a href="https://www.sciencedirect.com/science/article/pii/S0896627315001579">Ting et. al. (2015)</a> argues that motor modules arise from neural plasticity in spinal structures that selective coordinate and co-activate multiple muscles. The result is that animals can control tasks like balancing in a hierarchical fashion, keeping the dimension of the task-space control low.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8aEl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8aEl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 424w, https://substackcdn.com/image/fetch/$s_!8aEl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 848w, https://substackcdn.com/image/fetch/$s_!8aEl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 1272w, https://substackcdn.com/image/fetch/$s_!8aEl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8aEl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png" width="1456" height="638" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:638,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Modules in Biology&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Modules in Biology" title="Modules in Biology" srcset="https://substackcdn.com/image/fetch/$s_!8aEl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 424w, https://substackcdn.com/image/fetch/$s_!8aEl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 848w, https://substackcdn.com/image/fetch/$s_!8aEl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 1272w, https://substackcdn.com/image/fetch/$s_!8aEl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5335aabb-38d5-4b28-a7d5-b4b07775e040_1712x750.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Modularity in biology</figcaption></figure></div><p>While robots typically have fewer actuators than an animal has muscles, each individual task will typically be overactuated for a general-purpose robot. For example, a humanoid robot will not need its arms to maintain a standing posture.</p><p>If we accept the presence of these motor modules, these patterns of activation could be re-used for different behaviors. Quoting <a href="https://www.sciencedirect.com/science/article/pii/S0896627315001579">Ting et. al. (2015)</a>:</p><blockquote><p>Multifunctionality: muscles can contribute to many actions; a few muscles can be combined in many ways to produce a wide range of different actions.</p></blockquote><p>Making equivalences to the synthetic disciplines, there is a clear connection to the idea of re-using behavioral modules, as we showed with <a href="/vertical-hopper-compositions">Minitaur vertical hopper compositions</a>.</p><p>Putting it all together, I&#8217;d argue that there are equivalences between biology and robotics in three distinct aspects of modularity:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WRpR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WRpR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 424w, https://substackcdn.com/image/fetch/$s_!WRpR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 848w, https://substackcdn.com/image/fetch/$s_!WRpR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 1272w, https://substackcdn.com/image/fetch/$s_!WRpR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WRpR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png" width="1341" height="290" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:290,&quot;width&quot;:1341,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Modularity is Everywhere&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Modularity is Everywhere" title="Modularity is Everywhere" srcset="https://substackcdn.com/image/fetch/$s_!WRpR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 424w, https://substackcdn.com/image/fetch/$s_!WRpR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 848w, https://substackcdn.com/image/fetch/$s_!WRpR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 1272w, https://substackcdn.com/image/fetch/$s_!WRpR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe86aebea-3af7-475f-9828-6ceb76c5f8a9_1341x290.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h3>Modularity benefits</h3><p>Some of the benefits of modularity that are enjoyed by biological systems can also apply to the synthetic disciplines as well.</p><p>Motor modules can help navigate a &#8220;difficult-to-search and nonlinear set of neuromechanical solutions for movement&#8221; (<a href="https://www.sciencedirect.com/science/article/pii/S0896627315001579">Ting et. al. (2015)</a>) as well as the &#8220;curse of dimensionality&#8221; in various engineering disciplines. This has clear implications on the computational requirements for algorithms.</p><p>A slightly less obvious use case for modularity is for optimizing robot design for <a href="/template-based-design-robobee">flapping</a>, <a href="https://www.science.org/doi/abs/10.1126/scirobotics.aag2048">jumping</a>, etc., using coordinated movement patterns (or, template trajectories).</p><h2>Real-world robotics</h2><p>As robotics tools proliferate, their side-effects will start to also have a larger and larger impact on society.</p><h3>Safety and predictability</h3><p>The autonomous vehicle industry is possibly the first (but certainly not the last) subfield that has been thrust into the limelight of the question of safety of autonomous systems. The responsible peer-reviewed efforts of the first-party companies (e.g. <a href="https://waymo.com/safety/research/">Waymo</a>) are huge steps in the right direction, but that is certainly not the end of the story.</p><p>Robustness and multiple solutions inherent to a modular structure (as we saw above) is in stark contrast to the weakness of monolithic AI structures when subject to uncertainty (<a href="https://ieeexplore.ieee.org/document/10778107">Cummings</a>).</p><p>Intuitively, a modular architecture can be &#8220;debugged&#8221; and intermediate outputs can be logged and inspected. Just like a black box recording of an aircraft allows review of inputs made from the pilot to the machine, a modular structure allows insight into, and thresholding of, the function of individual modules:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LINj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LINj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 424w, https://substackcdn.com/image/fetch/$s_!LINj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 848w, https://substackcdn.com/image/fetch/$s_!LINj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 1272w, https://substackcdn.com/image/fetch/$s_!LINj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LINj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png" width="800" height="278" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:278,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Safety and Predictability&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Safety and Predictability" title="Safety and Predictability" srcset="https://substackcdn.com/image/fetch/$s_!LINj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 424w, https://substackcdn.com/image/fetch/$s_!LINj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 848w, https://substackcdn.com/image/fetch/$s_!LINj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 1272w, https://substackcdn.com/image/fetch/$s_!LINj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5500cc-b94c-42c8-b339-edb5d6050dab_800x278.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Safety via modularity</figcaption></figure></div><h3>Energy</h3><p>While mechanical work done by robots necessarily needs energetic input (and the conversion efficiency can be <a href="https://www.worldscientific.com/doi/abs/10.1142/9789814415958_0057">quite high</a>), the cost of computational work is nowhere close to the only known fundamental energetic limit based on <a href="https://en.wikipedia.org/wiki/Landauer%27s_principle">Landauer&#8217;s principle</a>.</p><p>Even as chips get more and more efficient, our appetite for computation outstrips those benefits, raising <a href="https://www.nature.com/articles/d41586-024-03408-z">continual</a> <a href="https://www.technologyreview.com/2024/12/13/1108719/ais-emissions-are-about-to-skyrocket-even-further/">concern</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uomH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uomH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 424w, https://substackcdn.com/image/fetch/$s_!uomH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 848w, https://substackcdn.com/image/fetch/$s_!uomH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 1272w, https://substackcdn.com/image/fetch/$s_!uomH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uomH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png" width="1110" height="502" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:502,&quot;width&quot;:1110,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Energy&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Energy" title="Energy" srcset="https://substackcdn.com/image/fetch/$s_!uomH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 424w, https://substackcdn.com/image/fetch/$s_!uomH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 848w, https://substackcdn.com/image/fetch/$s_!uomH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 1272w, https://substackcdn.com/image/fetch/$s_!uomH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3a3b5a-49a0-49f2-a29c-212fdc1884a6_1110x502.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">&#8220;AI&#8217;s energy crisis&#8221;</figcaption></figure></div><p>As already recognized by biology, a growing community of researchers are exploiting the fact that <a href="https://dl.acm.org/doi/10.1145/3408062">modular neural networks reduce power consumption</a>.</p><h2>The case for compositionality</h2><p>Modularity comes with a price. The motor modules in humans have appeared over the (long) course of animal evolution, and the modular control structures developed for robots need to be hand-crafted. These processes are much less automatic, and <a href="https://en.wikipedia.org/wiki/Attention_Is_All_You_Need">need more work than</a> scaling a simple structure with more data. In fact, the importance of pushing for architectural progress may not be limited to robotics (<a href="https://thenextweb.com/news/meta-yann-lecun-ai-behind-human-intelligence">LeCun</a>).</p><p>Additionally, modularity necessarily imposes limits on the space of usable methods or algorithms. For example, a modular controller reasoning with the equivalent of &#8220;motor modules&#8221; for a triple pendulum would never be able to accomplish this:</p><div id="youtube2-lbJfh0MOcp0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;lbJfh0MOcp0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/lbJfh0MOcp0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Nevertheless, the question of system abstraction with modularity has come up before in other fields such as digital VLSI and programming languages, and has clearly won out, in part due to the reasons discussed above.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TNPx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TNPx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 424w, https://substackcdn.com/image/fetch/$s_!TNPx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 848w, https://substackcdn.com/image/fetch/$s_!TNPx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 1272w, https://substackcdn.com/image/fetch/$s_!TNPx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TNPx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png" width="800" height="388" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:388,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Abstraction&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Abstraction" title="Abstraction" srcset="https://substackcdn.com/image/fetch/$s_!TNPx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 424w, https://substackcdn.com/image/fetch/$s_!TNPx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 848w, https://substackcdn.com/image/fetch/$s_!TNPx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 1272w, https://substackcdn.com/image/fetch/$s_!TNPx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F953e7f36-aba1-48f3-a58c-4bd53d608b23_800x388.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Abstraction in computer engineering</figcaption></figure></div><p>We don&#8217;t yet have a generally accepted methodology or architecture in robotics that could be a foundation for symbolic behavior programming.</p><p>End-to-end deep neural networks have become a useful and generally-accepted architecture without compositional properties, but neural networks are not necessarily incompatible with compositionality (<a href="https://direct.mit.edu/neco/article/35/3/413/114140/How-to-Represent-Part-Whole-Hierarchies-in-a">Hinton</a>, <a href="https://compositionalintelligence.github.io/pdfs/Marcus.pdf">Marcus</a>). For more on this topic, I highly recommend the proceedings of this workshop on <a href="https://compositionalintelligence.github.io/">The Challenge of Compositionality for AI</a>.</p><p>What is the path forward?</p><p>If we value the benefits of modularity discussed above, it will take more work to develop the correct architectures, but this work is essential to get to the point of robotics becoming a true scientific discipline with predictable outcomes.</p>]]></content:encoded></item><item><title><![CDATA[Approximating cyclic dynamics utilizing symmetry]]></title><description><![CDATA[Paper on Hybrid averaging (IJRR 2018)]]></description><link>https://www.avikde.me/p/hybrid-averaging</link><guid isPermaLink="false">https://www.avikde.me/p/hybrid-averaging</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Sun, 22 Dec 2024 00:00:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IEy8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11e6fa8-8e5e-458a-9ddd-d0edf63ae7db_800x382.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;ve previously written about the generative possibilities with parallel composition of reduced-order models. </p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;46f06827-ad6f-42a9-abbf-e131379b682c&quot;,&quot;caption&quot;:&quot;Extending the methodology for hopping behaviors on Jerboa to Minitaur required understanding how to compose monopedal hopping primitives onto a quadrupedal robot. To do this, we built on the old idea of virtual bipeds, but in a way that would be compatible with the formal guarantees of the&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Vertical hopper compositions (IJRR 2018)&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Interested in safe, efficient AI | Robotics founder&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-12-22T00:00:00.000Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/youtube/w_728,c_limit/ijnOCQOpC7k&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://minpower.substack.com/p/vertical-hopper-compositions&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:182198520,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Both these projects were extensions of the <a href="https://mitpress.mit.edu/9780262681193/legged-robots-that-balance/">Raibert&#8217;s intriguing concept</a> of &#8220;control in three parts&#8221; for the MIT Leg Lab planar hopper. However, it has been very difficult to formalize when this type of control may work. In some related work, it is called &#8220;decoupled control,&#8221; but it is clear that any robotic system of practical value will not have a <a href="https://math.mit.edu/~jorloff/suppnotes/suppnotes03/ls4.pdf">decoupling property</a>. The hallmark of articulated mechanical systems is that energy can be transferred among different components, which makes them expressive and capable, but also <a href="https://en.wikipedia.org/wiki/Double_pendulum#Chaotic_motion">difficult to analyze</a>.</p><p>In this project and paper<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, we proposed <a href="https://journals.sagepub.com/doi/full/10.1177/0278364918756498">hybrid dynamical averaging</a> as a way to make progress toward making formal arguments about these complex systems. This project only scratched the surface, but we applied this idea to <a href="/vertical-hopper-compositions">Minitaur vertical hopping</a> in a sequel. I think the idea still has a lot of potential in helping us make formal guarantees about the behavior of complex systems with symmetries (ubiquitous in locomotion), maybe playing an important role in formally guaranteeing their behavior in safety-critical scenarios.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power} by avikde! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>Dynamical averaging</h3><p>Since the exact system dynamics are difficult to directly analyze, it&#8217;s helpful to consider approximating the behavior somehow. To this end, we looked to the well-established theory of dynamical averaging (<a href="https://link.springer.com/book/10.1007/978-1-4612-1140-2">Guckenheimer and Holmes</a>), which applies to cyclic dynamical systems.</p><p>The idea is that a dynamical system with a limit cycle can be viewed in terms of a single &#8220;fast&#8221; coordinate <em>along</em> the limit cycle, and several <em>orthogonal</em> &#8220;slow&#8221; coordinates. As an example, an oscillating spring-mass system conserves energy, so the total mechanical energy is clearly a &#8220;slow&#8221; coordinate (it doesn&#8217;t change at all). A very slight generalization is a regulated oscillatory system, where the energy might vary as it stabilizes to its limit cycle.</p><p>These &#8220;fast/slow&#8221; dynamical systems can be approximated by averaging the <em>dynamics</em> along the limit cycle:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\dot x = f(x, t) \\approx \\bar f (x) := \\frac{1}{T} \\int_{t\\in \\mathscr{T}} f(x,t)&quot;,&quot;id&quot;:&quot;JLZZKELEOY&quot;}" data-component-name="LatexBlockToDOM"></div><p>By doing this, we have reduced the dimensionality of our system by one, and crucially, eliminated any variability in <em>f</em> that &#8220;averages out&#8221; over a cycle. This has huge intuitive (and mathematical) implications that we will come back to below.</p><h3>Choosing coordinates</h3><p>The spring-mass example above conveniently was two-dimensional, resulting in a singular slow coordinate that we easily identified with total energy. What happens when there are (for example) several coupled spring-mass oscillators, with total system dimension <em>n</em>?</p><p>This example is instantiated in a very real manner in the <a href="/vertical-hopper-compositions">Minitaur vertical hopping</a> demonstrations, and can appear frequently in coupled mechanical systems.</p><p>Our idea here was to think about the different coordinates as being:</p><ul><li><p>A single fast coordinate identified as the system <em>phase</em>: the cyclic coordinate that increments at a near-constant rate</p></li><li><p>A single slow coordinate identified as the system <em>energy</em>: an overall measure of the &#8220;amplitude&#8221; of the system</p></li><li><p>n&#8722;2 slow coordinates identified as <em>phase differences</em>: the relative phase of different degrees of freedom of the system</p></li></ul><p>These coordinates are related to (but not being formally connected to) <a href="https://en.wikipedia.org/wiki/Hamiltonian_mechanics">Hamiltonian phase space coordinates</a>. The phase differences intuitively arise when several degrees of freedom are coordinated together, as in a set of coupled oscillators.</p><p>A visual depiction of these coordinates, with (n=3), is shown below, where the blue manifold corresponds to the energy <a href="https://mathworld.wolfram.com/ZeroSet.html">zero set</a>, and the red manifold corresponds to the phase difference zero set. We intuitively refer to these as the <em>regulated</em> and <em>neutral</em> sets:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IEy8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11e6fa8-8e5e-458a-9ddd-d0edf63ae7db_800x382.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IEy8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11e6fa8-8e5e-458a-9ddd-d0edf63ae7db_800x382.png 424w, https://substackcdn.com/image/fetch/$s_!IEy8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11e6fa8-8e5e-458a-9ddd-d0edf63ae7db_800x382.png 848w, https://substackcdn.com/image/fetch/$s_!IEy8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11e6fa8-8e5e-458a-9ddd-d0edf63ae7db_800x382.png 1272w, https://substackcdn.com/image/fetch/$s_!IEy8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11e6fa8-8e5e-458a-9ddd-d0edf63ae7db_800x382.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IEy8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11e6fa8-8e5e-458a-9ddd-d0edf63ae7db_800x382.png" width="800" height="382" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d11e6fa8-8e5e-458a-9ddd-d0edf63ae7db_800x382.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:382,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Limit cycle&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Limit cycle" title="Limit cycle" srcset="https://substackcdn.com/image/fetch/$s_!IEy8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11e6fa8-8e5e-458a-9ddd-d0edf63ae7db_800x382.png 424w, https://substackcdn.com/image/fetch/$s_!IEy8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11e6fa8-8e5e-458a-9ddd-d0edf63ae7db_800x382.png 848w, https://substackcdn.com/image/fetch/$s_!IEy8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11e6fa8-8e5e-458a-9ddd-d0edf63ae7db_800x382.png 1272w, https://substackcdn.com/image/fetch/$s_!IEy8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11e6fa8-8e5e-458a-9ddd-d0edf63ae7db_800x382.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Picture depicting regulated and neutral sets</figcaption></figure></div><p><em>Note that for mechanical second-order systems, (n) must be even (twice the number of degrees-of-freedom); the (n=3) is for illustration.</em></p><h3>Hybrid averaging</h3><p>The major contribution of this paper was to prove that it is possible to extend dynamical averaging to hybrid systems. To do this, we used the <a href="https://arxiv.org/abs/2306.06862">saltation matrix</a> to capture the effect of the hybrid reset near the limit cycle.</p><p>A visual depiction of this is below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hGMN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b7642e1-4942-4b5e-9afe-e3d9dcbd67dd_800x637.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hGMN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b7642e1-4942-4b5e-9afe-e3d9dcbd67dd_800x637.png 424w, https://substackcdn.com/image/fetch/$s_!hGMN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b7642e1-4942-4b5e-9afe-e3d9dcbd67dd_800x637.png 848w, https://substackcdn.com/image/fetch/$s_!hGMN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b7642e1-4942-4b5e-9afe-e3d9dcbd67dd_800x637.png 1272w, https://substackcdn.com/image/fetch/$s_!hGMN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b7642e1-4942-4b5e-9afe-e3d9dcbd67dd_800x637.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hGMN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b7642e1-4942-4b5e-9afe-e3d9dcbd67dd_800x637.png" width="800" height="637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6b7642e1-4942-4b5e-9afe-e3d9dcbd67dd_800x637.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:637,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Averaged limit cycle&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Averaged limit cycle" title="Averaged limit cycle" srcset="https://substackcdn.com/image/fetch/$s_!hGMN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b7642e1-4942-4b5e-9afe-e3d9dcbd67dd_800x637.png 424w, https://substackcdn.com/image/fetch/$s_!hGMN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b7642e1-4942-4b5e-9afe-e3d9dcbd67dd_800x637.png 848w, https://substackcdn.com/image/fetch/$s_!hGMN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b7642e1-4942-4b5e-9afe-e3d9dcbd67dd_800x637.png 1272w, https://substackcdn.com/image/fetch/$s_!hGMN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b7642e1-4942-4b5e-9afe-e3d9dcbd67dd_800x637.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The red line represents the continuous trajectory with fast coordinate (x1) and slow coordinate (x2), which intersects the guard surface <em>G</em>, and then follows the reset map (R). This guard condition can result in variable flow durations away from the limit cycle, so we use the saltation matrix to create a &#8220;straightened&#8221; guard set (<em>G bar)</em> having the same stability properties as the original system. This &#8220;straightened guard&#8221; flow now has a fixed period for (x1), which can now use standard dynamical averaging theorems. The bottom plot shows simulations of a vertical hopping system with the actual (purple) and hybrid-averaged dynamics (orange) flows, showing their correspondence.</figcaption></figure></div><p>A much more technical treatment and formal proof of the intuitive idea is, of course, in the <a href="https://journals.sagepub.com/doi/full/10.1177/0278364918756498">paper</a>.</p><h3>Symmetries and averaging</h3><p>Applying (hybrid) averaging to study locomotion behaviors like hopping and running let us incorporate <em>time-reversal symmetry</em> to get amazing and intuitive reductions in system complexity.</p><p>First, the following slide, which some figures from <a href="https://mitpress.mit.edu/9780262681193/legged-robots-that-balance/">Legged Robots That Balance</a>, convey the ubiquity of time-reversal symmetry in locomotion. This is straightforward in systems like a one-legged hopper, but also appears in much more complex scenarios like a cat galloping.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qFrJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadec5db-67a7-4e6e-b1e9-3d5296592ac9_800x592.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qFrJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadec5db-67a7-4e6e-b1e9-3d5296592ac9_800x592.png 424w, https://substackcdn.com/image/fetch/$s_!qFrJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadec5db-67a7-4e6e-b1e9-3d5296592ac9_800x592.png 848w, https://substackcdn.com/image/fetch/$s_!qFrJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadec5db-67a7-4e6e-b1e9-3d5296592ac9_800x592.png 1272w, https://substackcdn.com/image/fetch/$s_!qFrJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadec5db-67a7-4e6e-b1e9-3d5296592ac9_800x592.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qFrJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadec5db-67a7-4e6e-b1e9-3d5296592ac9_800x592.png" width="800" height="592" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fadec5db-67a7-4e6e-b1e9-3d5296592ac9_800x592.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:592,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Time-reversal symmetry&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Time-reversal symmetry" title="Time-reversal symmetry" srcset="https://substackcdn.com/image/fetch/$s_!qFrJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadec5db-67a7-4e6e-b1e9-3d5296592ac9_800x592.png 424w, https://substackcdn.com/image/fetch/$s_!qFrJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadec5db-67a7-4e6e-b1e9-3d5296592ac9_800x592.png 848w, https://substackcdn.com/image/fetch/$s_!qFrJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadec5db-67a7-4e6e-b1e9-3d5296592ac9_800x592.png 1272w, https://substackcdn.com/image/fetch/$s_!qFrJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadec5db-67a7-4e6e-b1e9-3d5296592ac9_800x592.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Symmetry appears in all sorts of systems</figcaption></figure></div><p>As the slide suggests, the referenced symmetry is not merely a property of the resulting trajectory, but a property of the dynamics itself (i.e. the dynamics (xdot = f(x, u)) exhibits <a href="https://en.wikipedia.org/wiki/Even_and_odd_functions">symmetry</a> with respect to various components). This has exciting connections to <a href="https://en.wikipedia.org/wiki/Hamiltonian_mechanics">Hamiltonian mechanics</a> and <a href="https://en.wikipedia.org/wiki/Noether%27s_theorem">Noether&#8217;s theorem</a> that beg further exploration.</p><p>As annotated on the slide, the &#8220;symmetric&#8221; hopping trajectory in bold in the bottom right figure can be distinguished from all the asymmetric trajectories. These simply correspond to being on the &#8220;neutral&#8221; set in our nomenclature above, or not, respectively. Putting all this together, when the dynamics are averaged at a neutral limit cycle, the dynamics are greatly simplified (intuitively, <a href="https://en.wikipedia.org/wiki/Even_and_odd_functions">odd functions</a> integrate out), giving us a great degree of analytical simplification.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><em>The <a href="https://journals.sagepub.com/doi/full/10.1177/0278364918756498">paper</a> corresponding to this article was published in 2018.</em></p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Using models to design a RoboBee]]></title><description><![CDATA[Paper in IROS 2020 about using templates and optimization for robot design]]></description><link>https://www.avikde.me/p/template-based-design-robobee</link><guid isPermaLink="false">https://www.avikde.me/p/template-based-design-robobee</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Sun, 22 Dec 2024 00:00:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_Aa4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>While there have been numerous projects where I&#8217;ve utilized reduced-order models for control, this is the first published<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> one where I&#8217;ve been able to use them for optimizing robot design. It is also applied to a fairly complex system, the <a href="https://en.wikipedia.org/wiki/RoboBee">RoboBee</a>, justifying the effort expended into developing numerical methods for design.</p><h3>System architecture</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_Aa4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_Aa4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png 424w, https://substackcdn.com/image/fetch/$s_!_Aa4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png 848w, https://substackcdn.com/image/fetch/$s_!_Aa4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png 1272w, https://substackcdn.com/image/fetch/$s_!_Aa4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_Aa4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png" width="1000" height="523" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:523,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Idea&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Idea" title="Idea" srcset="https://substackcdn.com/image/fetch/$s_!_Aa4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png 424w, https://substackcdn.com/image/fetch/$s_!_Aa4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png 848w, https://substackcdn.com/image/fetch/$s_!_Aa4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png 1272w, https://substackcdn.com/image/fetch/$s_!_Aa4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ff5de3-8a25-4101-810f-767f7f11b5f7_1000x523.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Overall architecture for template-based design</figcaption></figure></div><p>The starting points for this method are a template model, and a task. In this case, the task is flapping, and the model is the <a href="https://en.wikipedia.org/wiki/Blade_element_theory">blade element model</a> applied to flapping wings. Those two, in conjuction give us a dynamical trajectory, with kinematics and interaction forces with the environment.</p><p>These kinematics are obviously parameterized by the parameters of the model that generated them. For example, if a flapping wing has more inertia, we can expect a smaller angular amplitude when subject to the same flapping torque.</p><p>For our design optimization, we non-dimensionalize the times and lengths in the trajectory. With this non-dimensional trajectory <em>y(t)</em>, and a parameter vector <em>p</em>, we utilize the fact that in mechanical systems, the dynamics are affine in <em>p</em>, i.e. the dynamical equations of motion can be written in the form</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;U(y+\\Delta y) f(p) = 0&quot;,&quot;id&quot;:&quot;SIVDIDSCHJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>This fact underpins classical <a href="https://en.wikipedia.org/wiki/Adaptive_control">adaptive control</a>, and in our case here, allows us to establish a bilinear constraint between state and parameter variables.</p><p>So, the design optimization problems attempts to find paramters that can minimize energy consumption, or another objective specified in <em>&#981;(p)</em>, while minimizing the deviation from the desired trajectory <em>&#8214;&#916;y&#8214;</em>, and constrained by the system dynamics.</p><h3>Nonlinear transmission design</h3><p>The main empirical contribution of this paper was a new type of transmission I designed for the RoboBee (<strong>A</strong> in the figure below), where the transmission ratio varied as a function of the actuator angle.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a2Qo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F710acc39-1c69-4080-ae97-67e332db1673_1000x302.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a2Qo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F710acc39-1c69-4080-ae97-67e332db1673_1000x302.png 424w, https://substackcdn.com/image/fetch/$s_!a2Qo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F710acc39-1c69-4080-ae97-67e332db1673_1000x302.png 848w, https://substackcdn.com/image/fetch/$s_!a2Qo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F710acc39-1c69-4080-ae97-67e332db1673_1000x302.png 1272w, https://substackcdn.com/image/fetch/$s_!a2Qo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F710acc39-1c69-4080-ae97-67e332db1673_1000x302.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a2Qo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F710acc39-1c69-4080-ae97-67e332db1673_1000x302.png" width="1000" height="302" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/710acc39-1c69-4080-ae97-67e332db1673_1000x302.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:302,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Results&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Results" title="Results" srcset="https://substackcdn.com/image/fetch/$s_!a2Qo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F710acc39-1c69-4080-ae97-67e332db1673_1000x302.png 424w, https://substackcdn.com/image/fetch/$s_!a2Qo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F710acc39-1c69-4080-ae97-67e332db1673_1000x302.png 848w, https://substackcdn.com/image/fetch/$s_!a2Qo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F710acc39-1c69-4080-ae97-67e332db1673_1000x302.png 1272w, https://substackcdn.com/image/fetch/$s_!a2Qo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F710acc39-1c69-4080-ae97-67e332db1673_1000x302.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Nonlinear transmission design</figcaption></figure></div><p>The green line in <strong>B</strong> shows an ideal nonlinear transmission (with ratio &#964;=&#964;1+&#964;2qact), and the yellow/blue lines show the physically realizable transmission developed using a non-parallel linkage system for this paper.</p><p>Intuitively, the goal was to have a lower transmission ratio (higher mechanical advantage) at midstroke, where the drag force is the highest, and higher transmission ratio (lower mechanical advantage) at the end-stroke positions where the drag force is lower.</p><p><strong>C/D</strong> in the figure above show the simulated results of using such a nonlinear transmission &#8211; the specific lift force (normalized by actuator mass) can be increased using this kind of nonlinearity.</p><h3>Co-design pitfalls</h3><p>The bilinearity in the constraint above exactly conveys some of the difficult aspects of &#8220;co-design&#8221; (i.e. simultaneous design and control). The plots below show some slices of contours of the objective function for the RoboBee design problem with highly nonlinear level sets.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Kqqq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3ef3671-6b7b-4b6b-99bc-4cb62a47a5a0_600x304.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Kqqq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3ef3671-6b7b-4b6b-99bc-4cb62a47a5a0_600x304.png 424w, https://substackcdn.com/image/fetch/$s_!Kqqq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3ef3671-6b7b-4b6b-99bc-4cb62a47a5a0_600x304.png 848w, https://substackcdn.com/image/fetch/$s_!Kqqq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3ef3671-6b7b-4b6b-99bc-4cb62a47a5a0_600x304.png 1272w, https://substackcdn.com/image/fetch/$s_!Kqqq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3ef3671-6b7b-4b6b-99bc-4cb62a47a5a0_600x304.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Kqqq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3ef3671-6b7b-4b6b-99bc-4cb62a47a5a0_600x304.png" width="600" height="304" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e3ef3671-6b7b-4b6b-99bc-4cb62a47a5a0_600x304.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:304,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Objective&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Objective" title="Objective" srcset="https://substackcdn.com/image/fetch/$s_!Kqqq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3ef3671-6b7b-4b6b-99bc-4cb62a47a5a0_600x304.png 424w, https://substackcdn.com/image/fetch/$s_!Kqqq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3ef3671-6b7b-4b6b-99bc-4cb62a47a5a0_600x304.png 848w, https://substackcdn.com/image/fetch/$s_!Kqqq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3ef3671-6b7b-4b6b-99bc-4cb62a47a5a0_600x304.png 1272w, https://substackcdn.com/image/fetch/$s_!Kqqq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3ef3671-6b7b-4b6b-99bc-4cb62a47a5a0_600x304.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Some observations from the plot above:</p><ul><li><p>The left plot shows a feasible set of wing mass (vertical axis) with respect to wing area (horizontal axis) between the two blue lines. The purple line is a minimum lift constraint needed to fly, for which a minimum wing area is needed. In the top right feasible region, there is a unique optimum that was found.</p></li><li><p>The right plot shows two parameters controlling the transmission ratio between the actuator and the wings. The feasible set of transmission designs is under the blue line in this case. The red line is a constraint on the maximum actuator displacement allowed (so that the piezoelectric actuator does not break), and a conservative linearization of that constraint is shown by the purple line. The objective function is again highly nonlinear, but the design optimization is able to find an optimum that would intuitively likely have been challenging.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power} by avikde! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><em>The <a href="https://scholar.google.com/citations?view_op=view_citation&amp;hl=en&amp;user=m-A4ZdEAAAAJ&amp;sortby=pubdate&amp;citation_for_view=m-A4ZdEAAAAJ:ODE9OILHJdcC">paper</a> corresponding to this article was published in 2020.</em></p></div></div>]]></content:encoded></item><item><title><![CDATA[Minitaur bounding, pronking using vertical hopper compositions]]></title><description><![CDATA[Simple controllers produce exciting quadrupedal behaviors - paper in IJRR 2018]]></description><link>https://www.avikde.me/p/vertical-hopper-compositions</link><guid isPermaLink="false">https://www.avikde.me/p/vertical-hopper-compositions</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Sun, 22 Dec 2024 00:00:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/ijnOCQOpC7k" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Extending<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> the methodology for <a href="/jerboa-hopping-video">hopping behaviors on Jerboa</a> to <a href="/ghost-robotics-minitaur">Minitaur</a> required understanding how to compose monopedal hopping primitives onto a quadrupedal robot. To do this, we built on the old idea of virtual bipeds, but in a way that would be compatible with the formal guarantees of the hybrid averaging framework that we had been publishing at about the same time:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;839d3de5-7c02-48a0-b09e-eed9a1b75fac&quot;,&quot;caption&quot;:&quot;I&#8217;ve previously written about the generative possibilities with parallel composition of reduced-order models.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Approximating cyclic dynamics utilizing symmetry&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:356074997,&quot;name&quot;:&quot;Avik De&quot;,&quot;bio&quot;:&quot;Interested in safe, efficient AI | Robotics founder&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30589b07-e0a0-4de5-8997-78db1ed3f65b_1290x1290.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-12-22T00:00:00.000Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!IEy8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11e6fa8-8e5e-458a-9ddd-d0edf63ae7db_800x382.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://minpower.substack.com/p/hybrid-averaging&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:182198524,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:7287367,&quot;publication_name&quot;:&quot;min{power}&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Axin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b9ffce7-a1b1-4bb3-9723-78af09a73493_608x608.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Using simple bottom-up, decentralized, decoupled controllers, we were able to show a wide range of gaits working stably on Minitaur, with pretty good performance:</p><div id="youtube2-ijnOCQOpC7k" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;ijnOCQOpC7k&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/ijnOCQOpC7k?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h3>Virtual bipeds</h3><p>The way we used the idea of virtual bipeds here was to &#8220;project&#8221; the coordinates roughly along the gray arrows, and &#8220;pair&#8221; the legs projected together.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PAro!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379265d8-9c3f-4a86-b405-74481e944c38_596x472.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PAro!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379265d8-9c3f-4a86-b405-74481e944c38_596x472.png 424w, https://substackcdn.com/image/fetch/$s_!PAro!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379265d8-9c3f-4a86-b405-74481e944c38_596x472.png 848w, https://substackcdn.com/image/fetch/$s_!PAro!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379265d8-9c3f-4a86-b405-74481e944c38_596x472.png 1272w, https://substackcdn.com/image/fetch/$s_!PAro!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379265d8-9c3f-4a86-b405-74481e944c38_596x472.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PAro!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379265d8-9c3f-4a86-b405-74481e944c38_596x472.png" width="596" height="472" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/379265d8-9c3f-4a86-b405-74481e944c38_596x472.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:472,&quot;width&quot;:596,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Virtual biped&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Virtual biped" title="Virtual biped" srcset="https://substackcdn.com/image/fetch/$s_!PAro!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379265d8-9c3f-4a86-b405-74481e944c38_596x472.png 424w, https://substackcdn.com/image/fetch/$s_!PAro!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379265d8-9c3f-4a86-b405-74481e944c38_596x472.png 848w, https://substackcdn.com/image/fetch/$s_!PAro!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379265d8-9c3f-4a86-b405-74481e944c38_596x472.png 1272w, https://substackcdn.com/image/fetch/$s_!PAro!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379265d8-9c3f-4a86-b405-74481e944c38_596x472.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Virtual bipeds</figcaption></figure></div><p>After this projection, we are left with a kind of planar bipedal system to analyze with three degrees of freedom, as shown in the right column above.</p><h3>Vertical hopper compositions</h3><p>Focusing on the bound/pronk projection, the types of limit cycles for these two gaits look like the flows roughly resembling the following picture:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rQPj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb5f961-cf99-4b08-97d6-524e1fb50b7e_486x648.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rQPj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb5f961-cf99-4b08-97d6-524e1fb50b7e_486x648.png 424w, https://substackcdn.com/image/fetch/$s_!rQPj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb5f961-cf99-4b08-97d6-524e1fb50b7e_486x648.png 848w, https://substackcdn.com/image/fetch/$s_!rQPj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb5f961-cf99-4b08-97d6-524e1fb50b7e_486x648.png 1272w, https://substackcdn.com/image/fetch/$s_!rQPj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb5f961-cf99-4b08-97d6-524e1fb50b7e_486x648.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rQPj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb5f961-cf99-4b08-97d6-524e1fb50b7e_486x648.png" width="278" height="370.6666666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0bb5f961-cf99-4b08-97d6-524e1fb50b7e_486x648.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:648,&quot;width&quot;:486,&quot;resizeWidth&quot;:278,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Vertical hopping limit cycles pronk bound&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Vertical hopping limit cycles pronk bound" title="Vertical hopping limit cycles pronk bound" srcset="https://substackcdn.com/image/fetch/$s_!rQPj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb5f961-cf99-4b08-97d6-524e1fb50b7e_486x648.png 424w, https://substackcdn.com/image/fetch/$s_!rQPj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb5f961-cf99-4b08-97d6-524e1fb50b7e_486x648.png 848w, https://substackcdn.com/image/fetch/$s_!rQPj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb5f961-cf99-4b08-97d6-524e1fb50b7e_486x648.png 1272w, https://substackcdn.com/image/fetch/$s_!rQPj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bb5f961-cf99-4b08-97d6-524e1fb50b7e_486x648.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Bipedal &#8220;gaits&#8221; showing bounding and pronking</figcaption></figure></div><p>Looking more closely the picture above, the axes in the plot are the &#8220;phases&#8221; of the two legs, a concept we talked about in the accompanying <a href="/hybrid-averaging">hybrid averaging</a> paper. That identification now encourages us to think about our original quadruped as a pair of (vertical) hoppers &#8220;coupled&#8221; by a body.</p><p>The &#8220;coupling&#8221; is physically instantiated by the body itself, and its inertia properties have a significant effect on the type of coupling. The <a href="https://en.wikipedia.org/wiki/Center_of_percussion">center of percussion</a> is a well-studied property of baseball bats, juggling clubs, etc. that relate impulses on one end of the object to the wrench on a different point along the object. We used this definition:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U_a3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0367ebe6-d03a-412a-948f-b55d89a41f04_746x454.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U_a3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0367ebe6-d03a-412a-948f-b55d89a41f04_746x454.png 424w, https://substackcdn.com/image/fetch/$s_!U_a3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0367ebe6-d03a-412a-948f-b55d89a41f04_746x454.png 848w, https://substackcdn.com/image/fetch/$s_!U_a3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0367ebe6-d03a-412a-948f-b55d89a41f04_746x454.png 1272w, https://substackcdn.com/image/fetch/$s_!U_a3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0367ebe6-d03a-412a-948f-b55d89a41f04_746x454.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U_a3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0367ebe6-d03a-412a-948f-b55d89a41f04_746x454.png" width="452" height="275.07774798927613" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0367ebe6-d03a-412a-948f-b55d89a41f04_746x454.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:454,&quot;width&quot;:746,&quot;resizeWidth&quot;:452,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Center of percussion&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Center of percussion" title="Center of percussion" srcset="https://substackcdn.com/image/fetch/$s_!U_a3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0367ebe6-d03a-412a-948f-b55d89a41f04_746x454.png 424w, https://substackcdn.com/image/fetch/$s_!U_a3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0367ebe6-d03a-412a-948f-b55d89a41f04_746x454.png 848w, https://substackcdn.com/image/fetch/$s_!U_a3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0367ebe6-d03a-412a-948f-b55d89a41f04_746x454.png 1272w, https://substackcdn.com/image/fetch/$s_!U_a3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0367ebe6-d03a-412a-948f-b55d89a41f04_746x454.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Center of percussion</figcaption></figure></div><p>Intuitively, the location of the center of percussion, which is in turn related to the mass distribution of the object, affects the type of coupling between the two hoppers.</p><h3>Pronking and bounding</h3><p>We showed using simulation, and by physically altering the inertia characteristics of sagittal-plane Minitaur by adding an &#8220;inertia bar&#8221; to its back, that we could indeed get both bounding and pronking limit cycles by programming its front and rear ends as completely independent vertical hoppers:</p><p>This is a bit of an extreme interpretation of the style of decoupled control originally pioneered by <a href="https://mitpress.mit.edu/9780262681193/legged-robots-that-balance/">Raibert</a> and also demonstrated on <a href="/jerboa-hopping-video">Jerboa</a>, but it also has interesting implications on the possibility of <a href="https://en.wikipedia.org/wiki/Distributed_control_system">distributed control</a> of legged robots. Another way to think about it is that we are formalizing <a href="https://en.wikipedia.org/wiki/Preflexes">preflexes</a>, which have historically played a pivotal role in the mechanical stabilization of animal and <a href="https://www.sciencedirect.com/science/article/abs/pii/S1467803904000398">robot</a> locomotion.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.avikde.me/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading min{power} by avikde! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><em>The <a href="https://scholar.google.com/citations?view_op=view_citation&amp;hl=en&amp;user=m-A4ZdEAAAAJ&amp;citation_for_view=m-A4ZdEAAAAJ:cWzG1nlazyYC">paper</a> corresponding to this article was published in 2018</em></p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[What are robot dogs, and what are they good for?]]></title><description><![CDATA[A brief what and why of quadrupedal robots, originally drafted for a magazine article]]></description><link>https://www.avikde.me/p/what-are-robot-dogs</link><guid isPermaLink="false">https://www.avikde.me/p/what-are-robot-dogs</guid><dc:creator><![CDATA[Avik De]]></dc:creator><pubDate>Sat, 21 Dec 2024 00:00:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Z7FY!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea21ccc-90aa-4750-861d-eb48a6144608_176x176.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I was recently contacted by a journalist for a magazine article about quadrupedal robots (what they are, what they are good for, etc.). It was a good exercise in processing my own thoughts, and since only a small amount of them ended up making it into the publication, I&#8217;ll record them here.</p><blockquote><p>Robots tend to imitate naturally occurring, biological structures. What is it about a dog&#8217;s form that makes it ideal to roboticize?</p></blockquote><p>The field of bio-inspired robotics is the result of close collaboration between biomechanists and roboticists. It is a synergistic discipline, where biologists study animals and distill their functioning principles into lessons that engineers can use to design robots. In the other direction, machines built with simple mechanisms and programmable behaviors provide testable hypotheses for the biologists to deepen their understanding of animals.</p><p>It&#8217;s important to note that useful robots are not usually created by mimicking animal form (biomimetics), but rather by taking inspiration from their working principles (bioinspiration). Engineers have access to different materials and power sources than those available to biological creatures, and animal bodies have evolved to serve many functions that are not relevant to a robot. For this reason, a four-legged robot may roughly resemble a dog&#8217;s form, and in particular move in a manner resembling a dog, but will not copy the intricate details of its biological structure.</p><p>Four-legged animals like cats or dogs usually walk on their toes (digitigrade), while humans and other two-legged animals walk on flat feet (plantigrade). This is related to the fact that the larger feet allow for a lot more control over balance than the toe-tips (imagine a human standing on one leg, compared to a cat or dog). In practical terms for engineers, this means that the legs of two-legged robots are a lot more complex, with many actuators needed for the hip, knee, and ankle in order to balance. In comparison, a four-legged robot can have much simpler leg designs, with no ankle or foot needed at all. In fact, typically, a four-legged robot has fewer motors than a two-legged robot, reducing cost and complexity, while having good agility and balance over difficult terrain.</p><blockquote><p>Can you walk me through how a robot dog works/functions?</p></blockquote><p>The body of a four-legged robot has four simple legs that can position the toe at any point within reach. For the robot to stand in place, four of these toes are in contact with the ground, and the forces generated using the motors in the leg create a reaction force on the body, thereby supporting it. When the robot needs to move, some of these legs leave the ground and swing through the air to reposition the toes in the direction of movement, or simply to find a better foothold. While the algorithms used to calculate these forces or where to reposition the legs can get quite complex, the basic working principle is simple: If you recall the last time you had to scramble up some terrain on a difficult hike, the idea of using limbs to either push against the ground, or reposition them, will seem quite familiar. These same principles are applied by algorithms to select when and where to reposition which legs, so that the robot stays balanced as it moves over different types of terrain.</p><p>Unlike wheeled vehicles, legged robots can move over broken or discontinuous terrain by picking isolated footholds, allowing them to go into locations without roads or trails. Other than the legs, the body of the robot contains a power source (typically a battery), computing devices to run the aforementioned algorithms, and sensors needed to sense the environment around the robot.</p><blockquote><p>The Vision series includes some of the most successful quadrupeds to date. Can you elaborate on certain design features that make it superior to other models?</p></blockquote><p>Despite the recent rise in popularity of legged robotics, we are still discovering newer and more impactful use-cases for this technology, and these applications inform which features robot manufacturers prioritize.</p><p>One important feature is energetic efficiency, which needs careful attention in all building blocks of a four-legged robot, ranging from its design to the algorithms used to control its motion. In the longer-term hopeful future of a proliferation of these robots in various use cases, this focus will allow a positive weighting of their utility against their energy consumption. Emerging technologies like artificial intelligence (AI) are currently having to <a href="https://www.nature.com/articles/d41586-024-03408-z">contend with this cost-benefit analysis</a>, and legged robots will be no different.</p><p>It is also important&#8211;because the technology is so new&#8211;to embrace developers and empower them to customize the robot to solve new problems. A wide range of peripheral connectivity options and a software developer kit can enable (for example), the development of a <a href="https://www.instagram.com/p/DCHxxlOPjga/">hose-pointing behavior for firefighter support</a>. Much like an app store amplifies the utility of a smartphone, customizability will increase the utility of legged robots in niche but impactful use cases.</p><blockquote><p>What are some of the trickier aspects when it comes to developing a robot dog? What challenges are developers in this space currently facing?</p></blockquote><p>Locomotion in challenging terrain requires cutting-edge algorithms, spanning from traditional methods that leverage classical physics principles, to ones that use machine learning to learn from experience and data. In either case, more sophisticated strategies may result in increased functionality, but at the expense of more computational power and fragility to changes in the robot hardware, operating environment, or payload. Balancing the utility of an algorithm against its computational footprint and sensitivity is a difficult task for roboticists.</p><p>It is also challenging to increase the battery life of a legged robot. It needs to carry its power source, and so while installing a larger battery would increase battery life, doing so may introduce other issues such as reduced payload capacity. Improvements in battery energy density and microprocessor computational efficiency are innovations that are currently outside of the scope of our in-house research team, and the efficiency of electric motors for legged robots is limited by the fundamentals of electromagnetics. This leaves robot designers with difficult optimization problems in mechanical design and algorithm selection as some of the only tools available to increase their running time.</p><p>Lastly, legged robots are complex electromechanical machines, currently being designed and built by research-oriented organizations that only have limited manufacturing experience. Eventually, legged robots will be able to be mass-produced with the same reliability and cost-effectiveness that we have come to expect from automobiles or household appliances, but getting to that point requires scale that will only come with time and expanding applications for these robots.</p><blockquote><p>Future of robot dogs: As robotics and AI become more integrated into industry and even daily life, how do you see robot dogs being applied in the near future? Especially considering the wide variety of attachment possibilities.</p></blockquote><p>Four-legged robots are very versatile platforms that exhibit good mobility, payload capacity, and range for a given size and weight. This means that they are able to carry sensors and payloads (attachments) for long missions, in built environments including stairs, or in rugged outdoor terrain. This makes them very suited for automating tasks such as <a href="https://www.tyndall.af.mil/News/Article-Display/Article/2550793/tyndall-brings-in-the-big-dogs/">security</a> and equipment inspection, surveying, and <a href="https://youtu.be/vpVlX1z4sFs?si=J7DdRv9tut0Z1S82">mapping</a>. These jobs currently involve workers going into remote locations, hazardous environments, or performing repetitive tasks, and are ripe for automation. Automated inspection and security call for payloads such as high-resolution, thermal, electro-optical, or low-light cameras, acoustic imagers, and hazardous gas sensors.</p><p>These robots also provide a great deal of utility as the eyes and ears of first responders, going first into dangerous situations such as <a href="https://www.overtdefense.com/2024/01/19/japans-ground-self-defense-force-deploys-robot-dogs-to-aid-earthquake-relief-efforts/">disaster</a> <a href="https://www.cincinnati.com/story/news/2024/11/10/daniel-carter-beard-bridge-fire-odot-uses-robodog-to-assess-damage/76126394007/">response</a>, and response to chemical, biological, radiological, nuclear, or explosive ordnance (CBRNE). For these applications, payloads such as multi-gas detectors, raman spectrometers, radiation, and explosive trace detectors are useful payloads.</p><blockquote><p>Why is this tech important?</p></blockquote><p>As the focus grows on the hazardous and unsafe conditions faced by workers across the world, it will be important for us to come up with machines and tools to perform these dull, dirty, and dangerous tasks in their stead. Parallel to the rapidly increasing utility and footprint of informational automation with AI, there is a need for automation of physical work, such as carrying sensors and other payloads, accessing dangerous locations, and automating mundane physical tasks. Four-legged robots are a flexible and versatile platform that have the potential to be adapted to all of these use cases in a cost- and energy-efficient manner.</p>]]></content:encoded></item></channel></rss>