Embodied AI New Forces: Former Meituan Delivery Head Enters the Arena
Wang Puzhong, former head of Meituan Delivery, has entered the embodied AI field, founding Yuanjie Intelligence and completing a multi-million yuan seed round. What makes this noteworthy isn't just another "executive starts a company" story -- it reflects a trend: more and more people with real-world operational experience are shifting from algorithms in the virtual world to robots in the physical world. This cross-pollination of expertise between the digital and physical worlds may be exactly what the robotics industry needs to break through its current limitations.
Why Embodied Intelligence?
Embodied AI, simply put, means giving AI a body -- the ability to operate in the physical world. Unlike the chatbots and recommendation algorithms that dominate the current AI landscape, embodied AI aims to create machines that can see, think, and act in the real world just like humans do.
Previous AI mostly operated in virtual spaces -- processing text, generating images, making recommendations. These applications have grown incredibly sophisticated, but they're fundamentally limited to the digital domain. Embodied AI tackles more practical problems: enabling robots to manipulate objects in a warehouse, navigate a home environment, perform surgery, inspect industrial equipment, and even assist elderly people with daily tasks.
The appeal of this direction is its very concrete application scenarios: logistics warehousing, home services, medical care, industrial inspection, agriculture, construction... Each of these addresses real-world needs, and many face labor shortages that make automation not just economically attractive but practically necessary. The global warehouse automation market alone is projected to reach over $40 billion by 2028.
Wang Puzhong's Cross-Industry Logic
The leap from food delivery to embodied intelligence might seem large, but there's a clear internal logic.
Food delivery is essentially a "complete tasks in the physical world" problem: how to plan routes through crowded cities, avoid obstacles in real-time, deliver items within strict time windows, and handle unexpected situations like weather changes, traffic jams, or building access problems. These problems overlap heavily with what embodied AI robots need to solve -- navigation, obstacle avoidance, dynamic environment adaptation, and reliable real-world performance.
Years of accumulated delivery data at Meituan -- urban road layouts, indoor building environments, dynamic obstacle patterns, time constraint optimization, weather impact on operations -- this experience has significant value in the embodied AI field. It's not that the data can be used directly (delivery robots face very different challenges from warehouse robots), but rather that there's a deep, practical understanding of "how difficult physical world tasks actually are."
Wang has mentioned in interviews that his experience running thousands of delivery stations taught him that the gap between "AI that works in a demo" and "AI that works in reality" is enormous. This pragmatic perspective -- understanding the messy, unpredictable nature of the physical world -- is exactly what the embodied AI industry needs.
Multimodal Navigation: The "Eyes" of Embodied Intelligence
A core technology for embodied AI is multimodal navigation -- enabling robots to understand their surroundings like humans do, combining information from multiple senses simultaneously.
Traditional robot navigation relies mainly on LiDAR or visual SLAM (Simultaneous Localization and Mapping). These perform well in structured environments (like factory assembly lines), but struggle in complex, real-world settings. Move a shelf, add an obstacle on the floor, change the lighting -- and traditional solutions start failing. They lack the adaptability that humans take for granted.
The multimodal navigation approach fuses multiple sensor data (cameras for visual understanding, LiDAR for depth measurement, IMU for motion tracking, and even tactile sensors for contact detection) combined with the semantic understanding capabilities of large language and vision models. This enables robots to know not just "where am I" but also "what's around me," "what are those objects," and "what should I do next."
The technical progress in this area has been genuinely fast. Robots that previously needed pre-built maps to navigate can now understand and make decision in unfamiliar environments in real time. A robot entering a new warehouse for the first time can now build a map on the fly, identify obstacles, and plan a path -- all without human intervention. This is a qualitative leap from just two years ago, when most commercial robots still required extensive environment-specific configuration.
The Real Challenges of Embodied Intelligence
Despite rapid progress, embodied AI still has a long way to go before large-scale commercial deployment.
Hardware costs. A robot capable of complex tasks still carries significant hardware costs. Precision robotic arms, advanced sensor suites, powerful onboard computing units -- none of them are cheap. While costs have dropped considerably (a decent robot arm that cost $50,000 five years ago can now be had for under $10,000), the total cost of a fully equipped robot is still prohibitive for many use cases. The industry is working toward the "$5,000 capable robot" milestone, which would open up entirely new market segments.
Reliability. In the virtual world, an AI mistake might mean a wrong reply or an irrelevant recommendation. In the physical world, an AI mistake could mean a robot crashing into something, damaging products, hurting someone, or destroying itself. The reliability bar is completely different. A chatbot that's right 95% of the time is excellent. A robot that's right 95% of the time is dangerously unreliable -- the 5% of failures could cause physical harm. Achieving 99.9%+ reliability in unstructured environments remains an enormous engineering challenge.
Generalization. A robot that performs well in the lab might completely fail in a real environment. The real world is far more complex than any lab -- lighting changes throughout the day, floor surfaces vary, objects are positioned unpredictably, humans behave erratically, and unexpected events occur constantly. Building systems that handle this diversity gracefully is one of the hardest problems in robotics. Every new environment brings new edge cases that weren't anticipated during development.
Data acquisition. Text data on the internet is virtually unlimited -- we have billions of web pages, books, and articles. Image and video data is abundant too. But the "physical world data" that robots need to learn from is much harder to acquire at scale. You can't just scrape the internet for data about how objects feel when you grip them, or how a robot arm should adjust its trajectory when it encounters unexpected resistance. This is why many companies invest heavily in simulation environments to train robots, but the gap between simulation and reality (the "sim-to-real gap") remains a persistent challenge.
A Few Judgments
Embodied intelligence is the next important direction for AI, but it won't explode overnight. The technology is advancing quickly, but from technology to product to mass commercialization, each step takes time. We'll see gradual adoption in controlled environments first (warehouses, factories) before open-world applications become common.
Multimodal navigation is a foundational capability for embodied intelligence. If a robot can't reliably move through an environment, other capabilities are moot. Breakthroughs in this area will drive the entire industry forward. Companies that solve navigation first will have a significant advantage.
Embodied intelligence will develop gradually, not in a revolution. It won't suddenly reach a point where robots can do everything. Instead, they'll progressively replace humans in specific scenarios and specific tasks. First simple, repetitive tasks in controlled environments. Then more complex tasks in slightly less controlled environments. Step by step, year by year.
China has unique advantages in embodied intelligence. A strong manufacturing base, complete supply chains, abundant application scenarios (China is the world's largest market for industrial robots), and a growing talent pool of robotics engineers are all favorable conditions. Chinese companies are also willing to deploy robots in real-world settings more aggressively than companies in some other markets.
The convergence of AI models and robotics is the key trend. The recent advances in large language models and vision-language models are directly applicable to robotics. A robot that can understand natural language instructions and reason about its environment using a powerful AI model is far more useful than one that only follows pre-programmed scripts. This convergence is what makes the current moment exciting for the embodied AI industry.
Wang Puzhong's entry is just one signal of this sector heating up. The embodied intelligence story is just beginning -- the truly exciting parts are still ahead. As more experienced operators from traditional industries bring their real-world expertise to robotics, and as AI models become more capable of understanding the physical world, we'll see robots move from the factory floor into our daily lives. It's not a matter of if, but when -- and the "when" is getting closer every day.