5 Comments
User's avatar
Roy Xing's avatar

I’m curious on your thoughts of having more layers instead of a pure 2 level HL->LL framework. It seems like humans do something like this with the cortex -> motor cortex -> brain stem/spinal cord. It’s interesting to see that Figure adopted this kind of hierarchy, any thoughts on the pros/cons on splitting the layered control architecture even more?

Also for what it’s worth I would vote for an IsaacSim implementation since it might be easier to have an RL pipeline that’s already kind of bundled together with active developer support than piecing together your own RL stack, sim, evals, etc. But idk it is always satisfying to piece together something from scratch haha

Avik De's avatar

Agreed - more hierarchical layers mean more potentially performance-limiting interfaces, but also potentially better performance and debug ability. Another way to think about it is what happens if the higher level is switched off (maybe it needs to rethink or “reason”, in LLM terms)—it is nice if the lower levels can prevent safety issues by at least taking care of balance and other safety concerns independently.

Roy Xing's avatar

Yeah I was thinking of that too! Especially after seeing Matt Mason's Inner Robot blogpost [https://mtmason.com/the-inner-robot/] and some more neuroscience evidence of discrete functional structures (even at the high level) [https://www.cambridge.org/zw/universitypress/subjects/life-sciences/animal-behaviour/divided-brains-biology-and-behaviour-brain-asymmetries?format=PB]. I'm not fully convinced by arguments from pure learning people claiming that any engineered structure will always become the bottleneck, it seems clear that biological counterparts do it for efficiency of digesting complex information flow -> actions.

I'm looking forward to your third installment!

Bharath Suresh's avatar

From what I understand:

Skill Acquisition -> Runs offline during Training

Motor Adaptation -> Runs on the robot's compute

Is there something that runs offline, but after the initial training phase?

For example, a robot sends data about a new environment after a few hours to an offline computer, which then provides some feedback back to the robot as it continues in that environment.

Similar to how auto companies provide "OTA software updates" to your car even after you bought it to fix/improve something.

Avik De's avatar

Great question! Yes, there are approaches in machine learning for fine tuning or adapting models with new data - e.g. the fine-tuning process, LoRa or low-rank adaptation, meta-learning. This kind of incremental training is centralized (the model provider would do it), but there’s also a concept of federated learning, where it could be done with private data at customer sites. However, all of these are relatively niche, to my knowledge, in the context of the large foundation models.