Thoughts regarding Gemini Robotics-ER 1.5: the “trick” is cool - generate probable next action as words and call exact movements for robo arm. But why no one see obvious problems?

Try simple game: your kid (or your neighbour kid) describing what he sees and you telling what to do based on that. How far you can go in making coffee if kid never did that?