Converting the point of view of messages spoken to virtual assistants
Virtual Assistants can be quite literal at times. If a user says tell Bob I love him, most virtual assistants will extract the message I love him and send it to the user’s contact named Bob, rather than properly converting the message to I love you. We designed a system that takes a voice message from one user, converts the point of view of the message, and then delivers the result to its target user. We developed a rule-based model, which integrates a linear text classification model, part-of-speech tagging, and constituency parsing with rule-based transformation methods. We also investigated Neural Machine Translation (NMT) approaches, including traditional recurrent networks, CopyNet, and T5. We explored 5 metrics to gauge both naturalness and faithfulness automatically, and we chose to use BLEU plus METEOR for faithfulness, as well as relative perplexity using a separately trained language model (GPT) for naturalness. Transformer-Copynet and T5 performed similarly on faithfulness metrics, with T5 scoring 63.8 for BLEU and 83.0 for METEOR. CopyNet was the most natural, with a relative perplexity of 1.59. CopyNet also has 37 times fewer parameters than T5. We have publicly released our dataset, which is composed of 46,565 crowd-sourced samples.