Panelist: Margaret Mitchell Title: Strong Baselines, Evaluation and the Role of the Humans in Grounded Language Generation This talk will cover some of the tension between publishing compelling NLP results versus careful evaluation and best-effort testing of reasonable baselines. I will discuss some work that leverages human input alongside four different approaches to generation in NLP: syntax, HMMs, n-gram language modeling, and recurrent neural language modelling. While all of the above can be used effectively for solid results in NLP, the conclusions we draw are not always objectively supportable from the evaluations we run and the baselines we compare against. On the other hand, doing justice to baselines and evaluation can lead to papers that pack much less of a punch, a tricky issue in everyone's pursuit of sharing high-quality research.