s AI skyrocketed in popularity in recent years, we have seen various evolutions of the tech, such as AI agents, become massively popular. The hottest new buzz trend is Physical AI, and Microsoft is jumping into the fray.
Broadly speaking, Physical AI can be described as hardware that goes beyond what robots already do by perceiving the environment and then using reason to agentically perform or orchestrate actions. Microsoft’s first set of robotics models, Rho-alpha, translates spoken commands into actions for robotic systems performing bimanual manipulation tasks, such as using both hands at once.
The models, derived from Microsoft's Phi series, go a step further from the traditional vision-language-action models (VLAs) by adding tactile sensing, or the ability to understand physical cues. For instance, the company shares that efforts are underway to enable it to sense modalities such as force. That capability could be helpful in real-world scenarios, such as stopping a movement if someone is in the way.
Rho-alpha also enables robots to learn from the feedback given by people, which allows them to continue to learn on the job much like a person. Ultimately, Microsoft says the goal is to make physical systems more adaptable, both adjusting to their environment and people’s requests, making them more trustworthy. To achieve this goal, Ro-Alpha was trained on physical demonstrations, simulated tasks, and web-scale visual question-answering data, according to the company.
Lastly, Microsoft is tackling the scarcity of high-quality simulated data that accurately captures reality by having its training pipeline generate synthetic data using the open-source NVIDIA Isaac Sim framework. Those interested in using physical AI foundations and tools can join Microsoft's Research Early Access Program.




