409 research outputs found
Vision Transformer with Key-select Routing Attention for Single Image Dehazing
We present Ksformer, utilizing Multi-scale Key-select Routing Attention
(MKRA) for intelligent selection of key areas through multi-channel,
multi-scale windows with a top-k operator, and Lightweight Frequency Processing
Module (LFPM) to enhance high-frequency features, outperforming other dehazing
methods in tests.Comment: 5 pages,4 figures,IEICE Trans. Information and System
Interaction of perceptual grouping and crossmodal temporal capture in tactile apparent-motion
Previous studies have shown that in tasks requiring participants to report the direction of apparent motion, task-irrelevant mono-beeps can "capture'' visual motion perception when the beeps occur temporally close to the visual stimuli. However, the contributions of the relative timing of multimodal events and the event structure, modulating uni- and/or crossmodal perceptual grouping, remain unclear. To examine this question and extend the investigation to the tactile modality, the current experiments presented tactile two-tap apparent-motion streams, with an SOA of 400 ms between successive, left-/right-hand middle-finger taps, accompanied by task-irrelevant, non-spatial auditory stimuli. The streams were shown for 90 seconds, and participants' task was to continuously report the perceived (left-or rightward) direction of tactile motion. In Experiment 1, each tactile stimulus was paired with an auditory beep, though odd-numbered taps were paired with an asynchronous beep, with audiotactile SOAs ranging from -75 ms to 75 ms. Perceived direction of tactile motion varied systematically with audiotactile SOA, indicative of a temporal-capture effect. In Experiment 2, two audiotactile SOAs-one short (75 ms), one long (325 ms)-were compared. The long-SOA condition preserved the crossmodal event structure (so the temporal-capture dynamics should have been similar to that in Experiment 1), but both beeps now occurred temporally close to the taps on one side (even-numbered taps). The two SOAs were found to produce opposite modulations of apparent motion, indicative of an influence of crossmodal grouping. In Experiment 3, only odd-numbered, but not even-numbered, taps were paired with auditory beeps. This abolished the temporal-capture effect and, instead, a dominant percept of apparent motion from the audiotactile side to the tactile-only side was observed independently of the SOA variation. These findings suggest that asymmetric crossmodal grouping leads to an attentional modulation of apparent motion, which inhibits crossmodal temporal-capture effects
Robust Temporal Averaging of Time Intervals Between Action and Sensation
Perception of the time interval between one’s own action (a finger tapping) and the associated sensory feedback (a visual flash or an auditory beep) is critical for precise and flexible control of action and behavioral decision. Previous studies have examined temporal averaging for multiple time intervals and its role for perceptual organization and crossmodal integration. In the present study, we extended the temporal averaging from sensory stimuli to the coupling of action and its sensory feedback. We investigated whether and how temporal averaging could be achieved with respect to the multiple intervals in a sequence of action-sensory feedback events, and hence affect the subsequent timing behavior. In unimodal task, participants voluntarily tapped their index finger at a constant pace while receiving auditory feedback (beeps) with varied intervals as well as variances throughout the sequence. In crossmodal task, for a given sequence, each tap was accompanied randomly with either visual flash or auditory beep as sensory feedback. When the sequence was over, observers produced a subsequent tap with either auditory or visual stimulus, which enclose a probe interval. In both tasks, participants were required to make a two alternative forced choice (2AFC), to indicate whether the target interval is shorter or longer than the mean interval between taps and their associated sensory events in the preceding sequence. In both scenarios, participants’ judgments of the probe interval suggested that they had internalized the mean interval associated with specific bindings of action and sensation, showing a robust temporal averaging process for the interval between action and sensation
Visual apparent motion can be modulated by task-irrelevant lexical information
Previous studies have repeatedly demonstrated the impact of Gestalt structural grouping principles upon the parsing of motion correspondence in ambiguous apparent motion. Here, by embedding Chinese characters in a visual Ternus display that comprised two stimulus frames, we showed that the perception of visual apparent motion can be modulated by activation of task-irrelevant lexical representations. Each frame had two disks, with the second disk of the first frame and the first disk of the second frame being presented at the same location. Observers could perceive either "element motion," in which the endmost disk is seen as moving back and forth while the middle disk at the central position remains stationary, or "group motion," in which both disks appear to move laterally as a whole. More reports of group motion, as opposed to element motion, were obtained when the embedded characters formed two-character compound words than when they formed nonwords, although this lexicality effect appeared to be attenuated by the use of the same characters at the overlapping position across the two frames. Thus, grouping of visual elements in a changing world can be guided by both structural principles and prior world knowledge, including lexical information.PsychologyPsychology, ExperimentalSCI(E)PubMedSSCI2ARTICLE41010-10157
DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment
Large language models encode a vast amount of semantic knowledge and possess
remarkable understanding and reasoning capabilities. Previous research has
explored how to ground language models in robotic tasks to ensure that the
sequences generated by the language model are both logically correct and
practically executable. However, low-level execution may deviate from the
high-level plan due to environmental perturbations or imperfect controller
design. In this paper, we propose DoReMi, a novel language model grounding
framework that enables immediate Detection and Recovery from Misalignments
between plan and execution. Specifically, LLMs are leveraged for both planning
and generating constraints for planned steps. These constraints can indicate
plan-execution misalignments and we use a vision question answering (VQA) model
to check constraints during low-level skill execution. If certain misalignment
occurs, our method will call the language model to re-plan in order to recover
from misalignments. Experiments on various complex tasks including robot arms
and humanoid robots demonstrate that our method can lead to higher task success
rates and shorter task completion times. Videos of DoReMi are available at
https://sites.google.com/view/doremi-paper.Comment: 21 pages, 13 figure
How ChatGPT transformed teachers: the role of basic psychological needs in enhancing digital competence
IntroductionWith the rapid development of ChatGPT, its application in the field of education has garnered widespread attention. This study aims to explore the impact of ChatGPT on teachers’ digital competence (TDC) and the mediating role of basic psychological needs satisfaction (BPNS).MethodsThe study was conducted in China, collecting questionnaire data from 632 teachers through the QuestionStar platform. Structural equation modeling was employed using SmartPLS 4.0 to examine the effects of ChatGPT usage on TDC and its relationship with BPNS.ResultsThe findings indicate that ChatGPT has a significant effect on TDC, primarily through the fulfillment of competence and relatedness needs, while the impact of autonomy on TDC was not significant.DiscussionThe results indicate that ChatGPT can enhance TDC and improve intrinsic motivation by satisfying their basic psychological needs. It is recommended that the design of educational tools consider teachers’ psychological needs to promote their professional development and well-being. This provides practical guidance for educational institutions, emphasizing the importance of teachers in the digital transformation process
- …
