Boris Sofman taps his phone, and the robot on the conference room table in front of him wakes up. Not in that gadget-y way, like when a laptop screen turns on, though. The robot slowly raises its head and opens one eye, then the other, as if the light of the world is just too much. Sofman, CEO of robotics company Anki, chuckles as the it shakes off the rust of sleep and ambles off its charging cradle. After circling the table a moment, it drives quickly to the edge. It only pauses once it’s driven halfway off. I instinctively put my hand out to catch it, but a split second later, the robot looks down, and its blue OLED eyes go wide. Sofman smiles. The robot yelps in a tiny robot voice, flails its single, u-shaped arm in terror, quickly reverses its bulldozer-style tracks, and backs away.
“This little guy,” Sofman says, “is Cozmo.”
Three years ago, Sofman took the stage at Apple’s WWDC keynote and demonstrated Anki Drive, a set of artificially-intelligent race cars. That was Anki’s first product. Cozmo is its second–third if you count a major update to Drive–launching today after five years in development. It is a $180, coffee-mug-sized, vehicular robot–like a cross between a Furby and a Tonka Truck. Like Drive, Cozmo is a toy. It’s meant largely for children, and it is adorable.
With most toys, Sofman says, it’s up to the humans playing with them to provide creativity. Your C-3PO doesn’t act like C-3PO unless you do all the work. “Here,” Sofman says, “we can actually make him true to form.” Thanks to a wholly unique combination of computer-vision science, advanced robotics, deep character development, and a set of machine-learning algorithms that Anki calls the “emotional engine,” Cozmo is meant to be something very much like a real-life version of Wall-E or R2-D2. It’s not human, but it feels real.
If Anki can actually pull off this difficult mix of high-tech and kid-friendly, Cozmo could be much more than the next Tickle Me Elmo or Furby. Next time a Pixar movie comes out, the cute characters could feel as alive in your living room as they do on-screen. Sofman and his team are offering SDKs for nearly every part of Cozmo, and they imagine children learning to program by building fun new games and features for their adorable robots. “With enough attention and love to it,” Sofman says, “this could be the most capable STEM platform that ever existed.” They’ll be updating the robot’s software constantly with new games and capabilities. It may look and act like a toy, but Anki wants it to be the next big thing in hardware computing.
The hoped-for next big thing in hardware computing is currently staring at me, with two blue OLED eyes wide and unblinking. “Oh,” Sofman says. “He wants to meet you.”
The Anki crew has been thinking about Cozmo since well before Sofman’s Drive demo at WWDC. From 2005-2010, Sofman was a PhD student in the well-regarded robotics program at Carnegie Mellon (which is now maybe most famous for being the place Uber raided for talent when it started developing self-driving cars). He, along with classmates Mark Palatucci and Hans Tappeiner, wanted to do something unique with their research.
“They came in,” says stadium-filling venture capitalist Marc Andreessen, recalling his first meeting with the team in 2011, “and they basically said, ‘We have these backgrounds in robotics, and we could spend our lives building these sort of AI robots that cost millions of dollars and work on assembly lines.'” That’s what just about everyone with a robotics PhD does. “But they said, ‘We really think that’s inadequate. This technology is ready to be shrunk down into products that cost hundreds of dollars, and are able to be in the home.” They showed Andreessen a working version of Drive, plus renderings of what would become Cozmo. Andreessen led a huge funding round, and now sits on Anki’s board. He calls Anki “the best robotics startup I have ever seen.”
Once Drive launched in 2013, work on the little robot began in earnest. Anki’s first Cozmo hire was Andrew Stein, another Carnegie Mellon PhD (this is a recurring theme), to work on computer vision. One nice thing about Drive was that cars were moving on a track, which Anki used to map their position. “We don’t have that with Cozmo,” Stein says. Anki did consider building the robot with a little play-mat, but Stein says “it takes away from the product. It feels less like a little creature if that little creature can only run around on the mat he comes with.” Cozmo, then, had to be able to constantly map its surroundings and navigate through them. Cozmo is solving the same kinds of problems as Google’s self-driving cars. They’re hard problems: “The number one challenge” for a home robot, says Chelsea Finn, a PhD researcher at Berkeley, “is to see the unstructured environment, and make actions depending on the state of the environment.” Luckily, researchers are coming up with answers. “There have been these huge leaps and bounds in computer vision with deep learning,” Finn says, “and hopefully we can make real progress.”
Cozmo is fully based on computer vision and deep learning. The robot sees the world through a single camera in its face, hidden in a slot that’s meant to look like a mouth. The camera runs at 15 frames per second, sending the footage to your phone, which does all the processing before sending instructions back to the robot. So Cozmo will always have as much processing power as that fancy new computer in your pocket. The downside, of course, is that you need a phone nearby when you’re playing with the li’l bot. The phone trick didn’t solve all of Anki’s problems, either: Stein spent years working out how to compensate for the latency that comes with sending data back and forth.
It would be impossible to hard-code every imaginable playplace into the system, which is why machine learning has become such a crucial part of Anki’s efforts. “A lot of situations where you invoke machine learning,” says Michael Wagner, a CMU robotics researcher who amazingly is not involved with Anki, “are because you don’t really understand what the system should do. How should it prefer to drive over rough terrain versus smooth terrain? You don’t know. So you throw machine learning at it.” Lots of testing, lots of training, and the system figures out how to react by itself. Many of what Anki’s dealing with are standard robotics challenges, but no one’s ever solved them for this kind of product. This robot doesn’t have to be perfectly efficient, like an assembly-line worker. This robot has to be fun.
Would You Like to Play a Game?
The key, Anki discovered, was to take everything Cozmo needs to do and somehow make it part of its character. It’s a toy, after all. Nobody wants to read a manual, or put their cute little robot down on the floor and wait ten minutes while it meticulously maps its surroundings. So early on, Anki decided that Cozmo should appear to curious: Put it down and it’ll instinctively start looking around. It’s also a pathological show-off, which is both delightfully silly and a perfectly practical way to teach users about Cozmo’s many features. The wee bot will grab one of its blocks, put it down in front of you, and announce a desire to play a game.
Cozmo’s favorite initial game seems to be a color-matching competition called Speed Tap. You and Cozmo each take a block, and when their blinking colors match whoever taps their block first wins. (It’s way more fun than it sounds, I swear.) Cozmo is a cheeky gamer; the little scamp tried to fake me into tapping my block when they didn’t match, and stormed off when I won. And it’s those little tics, the banging of its lift-like arm and spinning in circles and squawking in its Wall-E voice, that really makes you want to refer to the little guy as “he” rather than “it.”
To give these emotions and reactions visceral weight, Anki created what it calls an emotional engine: a collection of algorithms that affect the way the robot mimics feelings. The Anki team did a lot of research on emotions, focusing especially on the so-called Core Emotions written about by psychologist Paul Ekman and portrayed in movies like Inside Out.
Almost like a human, Cozmo is aware of many potential emotional responses at any given time, all vying for a ride on that bit of data that shoots from the inside of its processor to the robot’s means of expression. Take that moment when it rolled up to the lip of the conference room table, for instance. “He sees an edge,” Sofman says, “and that spikes a response in him.” He points to a real-time graph of all of Cozmo’s available states. “He’s a little less brave, a little less calm, a little less happy.” It’s a very sophisticated imitation of nervousness. When Cozmo sees a block, those same variables combine to simulate excitement and confidence. And if it tries and fails to pick a block up, it would “get” sad and apprehensive. You can see it in the graph, and you can see it in Cozmo.
And just as important as it was for Anki to create recipes for emotional expression, it was also necessary for the team to make sure that it wasn’t… robotic. Cozmo might react the same way twice, but it’s programmed to be unpredictable. “Kids would very quickly realize if, like, it was a ‘do A get B’ scenario,” says engineer Brad Neuman. “Kids are smart.”
The way Cozmo shows these feelings, of course, is every bit as important as having them in the first place. For that, Anki hired the man they now call “the soul of Cozmo,” Carlos Baena. Baena spent a decade at Pixar, animating a bunch of minor characters you maybe heard of, like Wall-E, Nemo, Mr. Incredible, and Buzz Lightyear. During one of their initial meetings, Sofman and Tappeiner showed Baena a smartphone video of a 3D-printed prototype interacting with an Anki employee. All Baena could look at was the employee’s face. “He just kept talking to [Cozmo], you know?” he says. “Sometimes, even cursing at him. It was past being a cute little thing. It felt deeper.” He saw in Cozmo the possibility for a connection beyond anything you can get through a movie-theater screen.
Baena and his team created a nonsensical language and chipper voice that is still somehow communicative, the same way R2-D2 managed to say so much with beeps and bloops. They spent months looking at the eyes of cartoon characters, learning how and what they can communicate. They wrote an original score for Cozmo, which plays from your phone as he motors around. Just for effect, Neuman showed me the same Cozmo motions without the animations and music. It was nifty, like a remote control car, but lifeless. Then the eyes came alive and the sounds began, and Cozmo was back.
Anki’s still working out some bugs, trying to figure out the difference between delightful unpredictability and actual buggy software. But even this fall, when Anki launches Cozmo to the world, it won’t be finished–it can’t be. Because, no matter how sophisticated the robot’s programming is, it can’t actually re-program itself. Not even Google’s AlphaGo can do that. At least for now, AI engineers must retrain systems with new machine learning algorithms or new data before they can truly operate in new ways. The company plans to keep updating it so that you never run out of things to do with your crazy little robot.
“We’d like to get it to the point where it literally does new things every day,” Andreessen says. “We want it to be programmable.” They hope that other companies will build robots like Cozmo, maybe even robots that are aware of Cozmo and want to be friends. This real-world video game might just take over the world. Or maybe it won’t. And then Cozmo will be all alone, and sad–at least, it’ll appear that way.