Panelist: Mark Liberman Title: Three Steps Towards Real Artificial Speech Communication 1. Robust diarization, i.e. "who spoke when". Despite decades of effort, the best diarization systems have trouble with overlapping speech and noisy backgrounds. 2. Dialogue systems with human-like turn-taking. People break into others' turns for many reasons, cooperative as well as competitive. Dialogue systems typically interpret interruptions as reset signals, which they usually aren't. And systems generally don't speak during an interlocutor's turn, not even for backchannels, much less to correct misunderstandings, introduce relevant information, or cut to a proposed solution. 3. Conversational systems that can participate effectively in a meeting, manage a classroom, or chat over a game of poker. Such systems need to keep track of what is happening in the physical and social context, as well as who said what when. They need to learn how and when to contribute, and how to modulate their contributions dynamically as a function of others' interruptions. And they need to be able to adapt to different conversational cultures.