171 research outputs found

    Predictive Top-Down Integration of Prior Knowledge during Speech Perception

    Get PDF
    A striking feature of human perception is that our subjective experience depends not only on sensory information from the environment but also on our prior knowledge or expectations. The precise mechanisms by which sensory information and prior knowledge are integrated remain unclear, with longstanding disagreement concerning whether integration is strictly feedforward or whether higher-level knowledge influences sensory processing through feedback connections. Here we used concurrent EEG and MEG recordings to determine how sensory information and prior knowledge are integrated in the brain during speech perception. We manipulated listeners' prior knowledge of speech content by presenting matching, mismatching, or neutral written text before a degraded (noise-vocoded) spoken word. When speech conformed to prior knowledge, subjective perceptual clarity was enhanced. This enhancement in clarity was associated with a spatiotemporal profile of brain activity uniquely consistent with a feedback process: activity in the inferior frontal gyrus was modulated by prior knowledge before activity in lower-level sensory regions of the superior temporal gyrus. In parallel, we parametrically varied the level of speech degradation, and therefore the amount of sensory detail, so that changes in neural responses attributable to sensory information and prior knowledge could be directly compared. Although sensory detail and prior knowledge both enhanced speech clarity, they had an opposite influence on the evoked response in the superior temporal gyrus. We argue that these data are best explained within the framework of predictive coding in which sensory activity is compared with top-down predictions and only unexplained activity propagated through the cortical hierarchy

    Less is more: latent learning is maximized by shorter training sessions in auditory perceptual learning

    Get PDF
    Background: The time course and outcome of perceptual learning can be affected by the length and distribution of practice, but the training regimen parameters that govern these effects have received little systematic study in the auditory domain. We asked whether there was a minimum requirement on the number of trials within a training session for learning to occur, whether there was a maximum limit beyond which additional trials became ineffective, and whether multiple training sessions provided benefit over a single session. Methodology/Principal Findings: We investigated the efficacy of different regimens that varied in the distribution of practice across training sessions and in the overall amount of practice received on a frequency discrimination task. While learning was relatively robust to variations in regimen, the group with the shortest training sessions (~8 min) had significantly faster learning in early stages of training than groups with longer sessions. In later stages, the group with the longest training sessions (>1 hr) showed slower learning than the other groups, suggesting overtraining. Between-session improvements were inversely correlated with performance; they were largest at the start of training and reduced as training progressed. In a second experiment we found no additional longer-term improvement in performance, retention, or transfer of learning for a group that trained over 4 sessions (~4 hr in total) relative to a group that trained for a single session (~1 hr). However, the mechanisms of learning differed; the single-session group continued to improve in the days following cessation of training, whereas the multi-session group showed no further improvement once training had ceased. Conclusions/Significance: Shorter training sessions were advantageous because they allowed for more latent, between-session and post-training learning to emerge. These findings suggest that efficient regimens should use short training sessions, and optimized spacing between sessions

    Does training with amplitude modulated tones affect tone-vocoded speech perception?

    Get PDF
    Temporal-envelope cues are essential for successful speech perception. We asked here whether training on stimuli containing temporal-envelope cues without speech content can improve the perception of spectrally-degraded (vocoded) speech in which the temporal-envelope (but not the temporal fine structure) is mainly preserved. Two groups of listeners were trained on different amplitude-modulation (AM) based tasks, either AM detection or AM-rate discrimination (21 blocks of 60 trials during two days, 1260 trials; frequency range: 4Hz, 8Hz, and 16Hz), while an additional control group did not undertake any training. Consonant identification in vocoded vowel-consonant-vowel stimuli was tested before and after training on the AM tasks (or at an equivalent time interval for the control group). Following training, only the trained groups showed a significant improvement in the perception of vocoded speech, but the improvement did not significantly differ from that observed for controls. Thus, we do not find convincing evidence that this amount of training with temporal-envelope cues without speech content provide significant benefit for vocoded speech intelligibility. Alternative training regimens using vocoded speech along the linguistic hierarchy should be explored

    Rapid computations of spectrotemporal prediction error support perception of degraded speech.

    Get PDF
    Human speech perception can be described as Bayesian perceptual inference but how are these Bayesian computations instantiated neurally? We used magnetoencephalographic recordings of brain responses to degraded spoken words and experimentally manipulated signal quality and prior knowledge. We first demonstrate that spectrotemporal modulations in speech are more strongly represented in neural responses than alternative speech representations (e.g. spectrogram or articulatory features). Critically, we found an interaction between speech signal quality and expectations from prior written text on the quality of neural representations; increased signal quality enhanced neural representations of speech that mismatched with prior expectations, but led to greater suppression of speech that matched prior expectations. This interaction is a unique neural signature of prediction error computations and is apparent in neural responses within 100 ms of speech input. Our findings contribute to the detailed specification of a computational model of speech perception based on predictive coding frameworks

    Sustained neural rhythms reveal endogenous oscillations supporting speech perception.

    Get PDF
    Rhythmic sensory or electrical stimulation will produce rhythmic brain responses. These rhythmic responses are often interpreted as endogenous neural oscillations aligned (or "entrained") to the stimulus rhythm. However, stimulus-aligned brain responses can also be explained as a sequence of evoked responses, which only appear regular due to the rhythmicity of the stimulus, without necessarily involving underlying neural oscillations. To distinguish evoked responses from true oscillatory activity, we tested whether rhythmic stimulation produces oscillatory responses which continue after the end of the stimulus. Such sustained effects provide evidence for true involvement of neural oscillations. In Experiment 1, we found that rhythmic intelligible, but not unintelligible speech produces oscillatory responses in magnetoencephalography (MEG) which outlast the stimulus at parietal sensors. In Experiment 2, we found that transcranial alternating current stimulation (tACS) leads to rhythmic fluctuations in speech perception outcomes after the end of electrical stimulation. We further report that the phase relation between electroencephalography (EEG) responses and rhythmic intelligible speech can predict the tACS phase that leads to most accurate speech perception. Together, we provide fundamental results for several lines of research-including neural entrainment and tACS-and reveal endogenous neural oscillations as a key underlying principle for speech perception

    The Neural Time Course of Semantic Ambiguity Resolution in Speech Comprehension.

    Get PDF
    Semantically ambiguous words challenge speech comprehension, particularly when listeners must select a less frequent (subordinate) meaning at disambiguation. Using combined magnetoencephalography (MEG) and EEG, we measured neural responses associated with distinct cognitive operations during semantic ambiguity resolution in spoken sentences: (i) initial activation and selection of meanings in response to an ambiguous word and (ii) sentence reinterpretation in response to subsequent disambiguation to a subordinate meaning. Ambiguous words elicited an increased neural response approximately 400-800 msec after their acoustic offset compared with unambiguous control words in left frontotemporal MEG sensors, corresponding to sources in bilateral frontotemporal brain regions. This response may reflect increased demands on processes by which multiple alternative meanings are activated and maintained until later selection. Disambiguating words heard after an ambiguous word were associated with marginally increased neural activity over bilateral temporal MEG sensors and a central cluster of EEG electrodes, which localized to similar bilateral frontal and left temporal regions. This later neural response may reflect effortful semantic integration or elicitation of prediction errors that guide reinterpretation of previously selected word meanings. Across participants, the amplitude of the ambiguity response showed a marginal positive correlation with comprehension scores, suggesting that sentence comprehension benefits from additional processing around the time of an ambiguous word. Better comprehenders may have increased availability of subordinate meanings, perhaps due to higher quality lexical representations and reflected in a positive correlation between vocabulary size and comprehension success

    Does training with amplitude modulated tones affect tone-vocoded speech perception?

    Get PDF
    Temporal-envelope cues are essential for successful speech perception. We asked here whether training on stimuli containing temporal-envelope cues without speech content can improve the perception of spectrally-degraded (vocoded) speech in which the temporal-envelope (but not the temporal fine structure) is mainly preserved. Two groups of listeners were trained on different amplitude-modulation (AM) based tasks, either AM detection or AM-rate discrimination (21 blocks of 60 trials during two days, 1260 trials; frequency range: 4Hz, 8Hz, and 16Hz), while an additional control group did not undertake any training. Consonant identification in vocoded vowel-consonant-vowel stimuli was tested before and after training on the AM tasks (or at an equivalent time interval for the control group). Following training, only the trained groups showed a significant improvement in the perception of vocoded speech, but the improvement did not significantly differ from that observed for controls. Thus, we do not find convincing evidence that this amount of training with temporal-envelope cues without speech content provide significant benefit for vocoded speech intelligibility. Alternative training regimens using vocoded speech along the linguistic hierarchy should be explored

    Detecting and representing predictable structure during auditory scene analysis

    Get PDF
    We use psychophysics and MEG to test how sensitivity to input statistics facilitates auditory-scene-analysis (ASA). Human subjects listened to ‘scenes’ comprised of concurrent tonepip streams (sources). On occasional trials a new source appeared partway. Listeners were more accurate and quicker to detect source appearance in scenes comprised of temporally-regular (REG), rather than random (RAND), sources. MEG in passive listeners and those actively detecting appearance events revealed increased sustained activity in auditory and parietal cortex in REG relative to RAND scenes, emerging ~400 ms of scene-onset. Over and above this, appearance in REG scenes was associated with increased responses relative to RAND scenes. The effect of temporal structure on appearance-evoked responses was delayed when listeners were focused on the scenes relative to when listening passively, consistent with the notion that attention reduces ‘surprise’. Overall, the results implicate a mechanism that tracks predictability of multiple concurrent sources to facilitate active and passive ASA
    corecore