Alana: Social dialogue using an ensemble model and a ranker trained on user feedback
We describe our Alexa prize system (called ‘Alana’) which consists of an ensemble of bots, combining rule-based and machine learning systems, and using a contextual ranking mechanism to choose system responses. This paper reports on the version of the system developed and evaluated in the semi-finals of the competition (i.e. up to 15 August 2017), but not on subsequent enhancements. The ranker for this system was trained on real user feedback received during the competition, where we address the problem of how to train on the noisy and sparse feedback obtained during the competition. In order to avoid initial problems of inappropriate and boring utterances coming from big datasets such as Reddit and Twitter, we later focussed on ‘clean’ data sources such as news and facts. We report on experiments with different ranking functions and versions of our NewsBot. We find that a multi- turn news strategy is beneficial, and that a ranker trained on the ratings feedback from users is also effective. Our system continuously improved using the data gathered over the course over the competition (1 July – 15 August) . Our final user score (averaged user rating over the whole semi-finals period) was 3.12, and we achieved 3.3 for the averaged user rating over the last week of the semi-finals (8-15 August 2017). We were also able to achieve long dialogues (average 10.7 turns) during the competition period. In subsequent weeks, after the end of the semi-final competition, we have achieved our highest scores of 3.52 (daily average, 18th October), 3.45 (weekly average on 23 and 24 October), and average dialogue lengths of 14.6 turns (1 October), and median dialogue length of 2.25 minutes (average for 7 days on 10th October).