28 research outputs found
Incremental multi-party conversational AI for people with dementia
Spoken dialogue systems (SDSs, e.g. Siri and Alexa) are trained on huge corpora, helping
them accurately understand the ‘average’ user. Speech production is nuanced, however,
so some user groups fall outside the ‘average’. This thesis focuses on SDSs for people
with dementia (PwD). More naturally interactive and accessible SDSs can improve people’s autonomy at home, and in public spaces. Three challenges are tackled in this thesis,
ethical data collection, incrementality, and multi-party conversations (MPCs).
Part I details the motivations of this work, in the context of voice assistant accessibility,
with a specific focus on language technologies for people with dementia. The thesis is
then introduced in its entirety through published paper summaries, with a structure guide.
Part II focuses on data collection. An ethical framework is presented to ensure all data is
collected ethically. A data capture device is then presented to address novel challenges
introduced by COVID-19. Using the ethical framework and device, the DEICTIC corpus
was collected. It verified that, when talking to an SDS, PwD pause significantly more
often, and for significantly longer durations, than people without dementia. The corpus
also reveals that 28% of PwD’s interactions with an SDS are MPCs involving their partner. SDSs are not adapted for MPCs, so a second data collection was designed. Hospital
staff subsequently used this design with memory clinic patients and their companions.
Part III focuses on incrementality. Microsoft’s incremental speech recognition is the most
responsive, stable, accurate, and the only one that preserves disfluent material. IBM’s
services were the most suitable for MPCs. Two corpora were created and released to
explore incremental semantic parsing, together containing over 105,000 interrupted utterances paired with their underspecified meaning representation. SDSs interrupt users
if they pause too long mid-utterance, requiring them to frustratingly repeat themselves.
The use of incremental clarification requests (iCRs, e.g. “author of what?”) leads to more
naturally interactive SDSs, and improves their accessibility for PwD. Another new corpus
was created and released, containing 3,000 human elicited clarification requests. It was
used to show that some large language models (LLMs) can generate context-appropriate
iCRs, and can interpret clarification exchanges as if they were one uninterrupted turn.
Part IV tackles MPCs. The hospital corpus showed that MPCs elicit unique, complex behaviours. LLMs performed remarkably at the new task of multi-party goal tracking, when
given examples from the corpus. A multi-party SDS is required for further research, so
all the work presented in this thesis was integrated into one system, embodied by an ARI robot. It has been designed to handle MPCs with memory clinic patients and their companions, and is designed to be accessible for PwD. When PwD pause mid-utterance, the
system generates an appropriate iCR, and interprets the resulting clarification exchange.
In summary, this thesis identifies that PwD pause significantly more often, and for significantly longer durations, than people without dementia. Additionally, these interactions
are often multi-party. When mid-utterance pauses occur, interactions can be recovered
through the use of iCRs. Using the SLUICE-CR corpus, LLMs can generate effective
and human-like iCRs. They can also be used to interpret clarification exchanges, and interpret multi-party interactions. This work was integrated and deployed on a social robot
to enable conversations between the robot, memory clinic patients, and their companions
You have interrupted me again!: making voice assistants more dementia-friendly with incremental clarification
In spontaneous conversation, speakers seldom have a full plan of what they are going to say in advance: they need to conceptualise and plan incrementally as they articulate each word in turn. This often leads to long pauses mid-utterance. Listeners either wait out the pause, offer a possible completion, or respond with an incremental clarification request (iCR), intended to recover the rest of the truncated turn. The ability to generate iCRs in response to pauses is therefore important in building natural and robust everyday voice assistants (EVA) such as Amazon Alexa. This becomes crucial with people with dementia (PwDs) as a target user group since they are known to pause longer and more frequently, with current state-of-the-art EVAs interrupting them prematurely, leading to frustration and breakdown of the interaction. In this article, we first use two existing corpora of truncated utterances to establish the generation of clarification requests as an effective strategy for recovering from interruptions. We then proceed to report on, analyse, and release SLUICE-CR: a new corpus of 3,000 crowdsourced, human-produced iCRs, the first of its kind. We use this corpus to probe the incremental processing capability of a number of state-of-the-art large language models (LLMs) by evaluating (1) the quality of the model's generated iCRs in response to incomplete questions and (2) the ability of the said LLMs to respond correctly after the users response to the generated iCR. For (1), our experiments show that the ability to generate contextually appropriate iCRs only emerges at larger LLM sizes and only when prompted with example iCRs from our corpus. For (2), our results are in line with (1), that is, that larger LLMs interpret incremental clarificational exchanges more effectively. Overall, our results indicate that autoregressive language models (LMs) are, in principle, able to both understand and generate language incrementally and that LLMs can be configured to handle speech phenomena more commonly produced by PwDs, mitigating frustration with today's EVAs by improving their accessibility
Grounding LLMs to In-prompt Instructions: Reducing Hallucinations Caused by Static Pre-training Knowledge
When deploying LLMs in certain commercial or research settings, domain specific knowledge must be explicitly provided within the prompt. This in-prompt knowledge can conflict with an LLM’s static world knowledge learned at pre-training, causing model hallucination (see examples in Table 1). In safety-critical settings, like healthcare and finance, these hallucinations can harm vulnerable users. We have curated a QA corpus containing information that LLMs could not have seen at pre-training. Using our corpus, we have probed various LLMs, manipulating both the prompt and the knowledge representation. We have found that our ‘Jodie’ prompt consistently improves the model’s textual grounding to the given knowledge, and in-turn the overall answer accuracy. This is true in both the healthcare and finance domains – improving accuracy by up to 28% (mean: 12%). We have also identified that hierarchical and direct node-property graph structures could lead to more interpretable and controllable systems that provide a natural language interface with real-time in-domain knowledge. Our corpus will enable further work on this critical challenge
Grounding LLMs to In-prompt Instructions: Reducing Hallucinations Caused by Static Pre-training Knowledge
When deploying LLMs in certain commercial or research settings, domain specific knowledge must be explicitly provided within the prompt. This in-prompt knowledge can conflict with an LLM’s static world knowledge learned at pre-training, causing model hallucination (see examples in Table 1). In safety-critical settings, like healthcare and finance, these hallucinations can harm vulnerable users. We have curated a QA corpus containing information that LLMs could not have seen at pre-training. Using our corpus, we have probed various LLMs, manipulating both the prompt and the knowledge representation. We have found that our ‘Jodie’ prompt consistently improves the model’s textual grounding to the given knowledge, and in-turn the overall answer accuracy. This is true in both the healthcare and finance domains – improving accuracy by up to 28% (mean: 12%). We have also identified that hierarchical and direct node-property graph structures could lead to more interpretable and controllable systems that provide a natural language interface with real-time in-domain knowledge. Our corpus will enable further work on this critical challenge
Building for speech: designing the next-generation of social robots for audio interaction
There have been significant advances in robotics, conversational AI, and spoken dialogue systems (SDSs) over the past few years, but we still do not find social robots in public spaces such as train stations, shopping malls, or hospital waiting rooms. In this paper, we argue that early-stage collaboration between robot designers and SDS researchers is crucial for creating social robots that can legitimately be used in real-world environments. We draw from our experiences running experiments with social robots, and the surrounding literature, to highlight recurring issues. Robots need better speakers, a greater number of high-quality microphones, quieter motors, and quieter fans to enable human-robot spoken interaction in the wild. If a robot was designed to meet these requirements, researchers could create SDSs that are more accessible, and able to handle multi-party conversations in populated environments. Robust robot joints are also needed to limit potential harm to older adults and other more vulnerable groups. We suggest practical steps towards future real-world deployments of conversational AI systems for human-robot interaction
You have interrupted me again!: making voice assistants more dementia-friendly with incremental clarification
In spontaneous conversation, speakers seldom have a full plan of what they are going to say in advance: they need to conceptualise and plan incrementally as they articulate each word in turn. This often leads to long pauses mid-utterance. Listeners either wait out the pause, offer a possible completion, or respond with an incremental clarification request (iCR), intended to recover the rest of the truncated turn. The ability to generate iCRs in response to pauses is therefore important in building natural and robust everyday voice assistants (EVA) such as Amazon Alexa. This becomes crucial with people with dementia (PwDs) as a target user group since they are known to pause longer and more frequently, with current state-of-the-art EVAs interrupting them prematurely, leading to frustration and breakdown of the interaction. In this article, we first use two existing corpora of truncated utterances to establish the generation of clarification requests as an effective strategy for recovering from interruptions. We then proceed to report on, analyse, and release SLUICE-CR: a new corpus of 3,000 crowdsourced, human-produced iCRs, the first of its kind. We use this corpus to probe the incremental processing capability of a number of state-of-the-art large language models (LLMs) by evaluating (1) the quality of the model's generated iCRs in response to incomplete questions and (2) the ability of the said LLMs to respond correctly after the users response to the generated iCR. For (1), our experiments show that the ability to generate contextually appropriate iCRs only emerges at larger LLM sizes and only when prompted with example iCRs from our corpus. For (2), our results are in line with (1), that is, that larger LLMs interpret incremental clarificational exchanges more effectively. Overall, our results indicate that autoregressive language models (LMs) are, in principle, able to both understand and generate language incrementally and that LLMs can be configured to handle speech phenomena more commonly produced by PwDs, mitigating frustration with today's EVAs by improving their accessibility
Clarifying Completions: Evaluating How LLMs Respond to Incomplete Questions
People understand and produce language incrementally on a word by word basis. This gives rise to many characteristic conversational phenomena including long mid-sentence pauses that are followed by incremental clarification requests (iCRs) intended to recover the rest of the truncated turn (see Fig. 1; (A), (B), (C)). The ability to generate iCRs is important in natural conversational AI systems, and crucial to their accessibility to users with memory impairment. In this paper, we collect, release and analyse sluice-cr: a large corpus of 3000 human produced iCRs. We then use this corpus to probe the incremental processing capability of a number of state of the art LLMs by evaluating the quality of the model's generated iCRs in response to incomplete questions. Our evaluations show that the ability to generate contextually appropriate iCRs only emerges at larger LLM sizes, and only when prompted with example iCRs from our corpus. They also indicate that autoregressive LMs are, in principle, able to both understand and generate language incrementally
