Introduction

In fields such as market research, the social sciences, political science, and customer relationship management, data are often collected through surveys, conducted by a survey specialist and involving a number of respondents (de Vaus, 2014). Conducting a survey usually involves a questionnaire, that is, a list of questions which respondents are asked to answer. The majority of questions to be found in questionnaires are of the "closed" type, where the respondent is required to tick one of a predefined set of answers. Open (a.k.a. "open-ended") questions instead involve returning a textual answer, whose length is not specified a priori. When computing the results of the survey, closed answers and open answers require very different amounts of processing: while closed answers simply involve checking which (or how many) respondents have picked which predefined options, open answers require more complex analysis. In order to manage open answers, the survey specialist first defines a classification scheme, that is, a set of classes of interest for the given application (e.g., 'BadCustomerSupport', 'IssuesWithWebsite', etc., for a customer satisfaction survey run by a telecom company), and then classifies (i.e., attributes one or more classes from the classification scheme to) each answer based on its textual content. The results of the survey are then obtained by checking which (or how many) respondents' answers have been attributed which class.