464 research outputs found
Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli
In natural vision both stimulus features and task-demands affect an observer's attention. However, the relationship between sensory-driven (“bottom-up”) and task-dependent (“top-down”) factors remains controversial: Can task-demands counteract strong sensory signals fully, quickly, and irrespective of bottom-up features? To measure attention under naturalistic conditions, we recorded eye-movements in human observers, while they viewed photographs of outdoor scenes. In the first experiment, smooth modulations of contrast biased the stimuli's sensory-driven saliency towards one side. In free-viewing, observers' eye-positions were immediately biased toward the high-contrast, i.e., high-saliency, side. However, this sensory-driven bias disappeared entirely when observers searched for a bull's-eye target embedded with equal probability to either side of the stimulus. When the target always occurred in the low-contrast side, observers' eye-positions were immediately biased towards this low-saliency side, i.e., the sensory-driven bias reversed. Hence, task-demands do not only override sensory-driven saliency but also actively countermand it. In a second experiment, a 5-Hz flicker replaced the contrast gradient. Whereas the bias was less persistent in free viewing, the overriding and reversal took longer to deploy. Hence, insufficient sensory-driven saliency cannot account for the bias reversal. In a third experiment, subjects searched for a spot of locally increased contrast (“oddity”) instead of the bull's-eye (“template”). In contrast to the other conditions, a slight sensory-driven free-viewing bias prevails in this condition. In a fourth experiment, we demonstrate that at known locations template targets are detected faster than oddity targets, suggesting that the former induce a stronger top-down drive when used as search targets. Taken together, task-demands can override sensory-driven saliency in complex visual stimuli almost immediately, and the extent of overriding depends on the search target and the overridden feature, but not on the latter's free-viewing saliency
Motivational Objects in Natural Scenes (MONS): A Database of >800 Objects
In daily life, we are surrounded by objects with pre-existing motivational associations. However, these are rarely controlled for in experiments with natural stimuli. Research on natural stimuli would therefore benefit from stimuli with well-defined motivational properties; in turn, such stimuli also open new paths in research on motivation. Here we introduce a database of Motivational Objects in Natural Scenes (MONS). The database consists of 116 scenes. Each scene contains 2 to 7 objects placed at approximately equal distance from the scene center. Each scene was photographed creating 3 versions, with one object (critical object) being replaced to vary the overall motivational value of the scene (appetitive, aversive, neutral), while maintaining high visual similarity between the three versions. Ratings on motivation, valence, arousal and recognizability were obtained using internet-based questionnaires. Since the main objective was to provide stimuli of well-defined motivational value, three motivation scales were used: (1) Desire to own the object; (2) Approach/Avoid; (3) Desire to interact with the object. Three sets of ratings were obtained in independent sets of observers: for all 805 objects presented on a neutral background, for 348 critical objects presented in their scene context, and for the entire scenes. On the basis of the motivational ratings, objects were subdivided into aversive, neutral, and appetitive categories. The MONS database will provide a standardized basis for future studies on motivational value under realistic conditions
Objects predict fixations better than early saliency
Humans move their eyes while looking at scenes and pictures. Eye movements correlate with shifts in attention and are thought to be a consequence of optimal resource allocation for high-level tasks such as visual recognition. Models of attention, such as “saliency maps,” are often built on the assumption that “early” features (color, contrast, orientation, motion, and so forth) drive attention directly. We explore an alternative hypothesis: Observers attend to “interesting” objects. To test this hypothesis, we measure the eye position of human observers while they inspect photographs of common natural
scenes. Our observers perform different tasks: artistic evaluation, analysis of content, and search. Immediately after each presentation, our observers are asked to name objects they saw. Weighted with recall frequency, these objects predict fixations in individual images better than early saliency, irrespective of task. Also, saliency combined with object positions predicts which objects are frequently named. This suggests that early saliency has only an indirect effect on attention, acting
through recognized objects. Consequently, rather than treating attention as mere preprocessing step for object recognition, models of both need to be integrated
Predicting human gaze using low-level saliency combined with face detection
Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model’s predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses
A bottom–up model of spatial attention predicts human error patterns in rapid scene recognition
Humans demonstrate a peculiar ability to detect complex targets in rapidly presented natural scenes. Recent studies suggest that (nearly) no focal attention is required for overall performance in such tasks. Little is known, however, of how detection performance varies from trial to trial and which stages in the processing hierarchy limit performance: bottom–up visual processing (attentional selection and/or recognition) or top–down factors (e.g., decision-making, memory, or alertness fluctuations)? To investigate the relative contribution of these factors, eight human observers performed an animal detection task in natural scenes presented at 20 Hz. Trial-by-trial performance was highly consistent across observers, far exceeding the prediction of independent errors. This consistency demonstrates that performance is not primarily limited by idiosyncratic factors but by visual processing. Two statistical stimulus properties, contrast variation in the target image and the information-theoretical measure of “surprise” in adjacent images, predict performance on a trial-by-trial basis. These measures are tightly related to spatial attention, demonstrating that spatial attention and rapid target detection share common mechanisms. To isolate the causal contribution of the surprise measure, eight additional observers performed the animal detection task in sequences that were reordered versions of those all subjects had correctly recognized in the first experiment. Reordering increased surprise before and/or after the target while keeping the target and distractors themselves unchanged. Surprise enhancement impaired target detection in all observers. Consequently, and contrary to several previously published findings, our results demonstrate that attentional limitations, rather than target recognition alone, affect the detection of targets in rapidly presented visual sequences
Spatial attention increases performance but not subjective confidence in a discrimination task
Selective attention to a target yields faster and more accurate responses. Faster response times, in turn, are usually associated with increased subjective confidence. Could the decrease in reaction time in the presence of attention therefore simply reflect a shift toward more confident responses? We here addressed the extent to which attention modulates accuracy, processing speed, and confidence independently. To probe the effect of spatial attention on performance, we used two attentional manipulations of a visual orientation discrimination task. We demonstrate that spatial attention significantly increases accuracy, whereas subjective confidence measures reveal overconfidence in non-attended stimuli. At constant confidence levels, reaction times showed a significant decrease (by 15–49%, corresponding to 100–250 ms). This dissociation of objective performance and subjective confidence suggests that attention and awareness, as measured by confidence, are distinct, albeit related, phenomena
The role of first- and second-order stimulus features for human overt attention
When processing complex visual input, human observers sequentially allocate their attention to different subsets of the stimulus. What are the mechanisms and strategies that guide this selection process? We investigated the influence of various stimulus features on human overt attention—that is, attention related to shifts of gaze with natural color images and modified versions thereof. Our experimental modifications, systematic changes of hue across the entire image, influenced only the global appearance of the stimuli, leaving the local features under investigation unaffected. We demonstrated that these modifications consistently reduce the subjective interpretation of a stimulus as "natural” across observers. By analyzing fixations, we found that first-order features, such as luminance contrast, saturation, and color contrast along either of the cardinal axes, correlated to overt attention in the modified images. In contrast, no such correlation was found in unmodified outdoor images. Second-order luminance contrast ("texture contrast”) correlated to overt attention in all conditions. However, although none of the second-order color contrasts were correlated to overt attention in unmodified images, one of the second-order color contrasts did exhibit a significant correlation in the modified images. These findings imply, on the one hand, that higher-order bottom-up effects—namely, those of second-order luminance contrast—may partially account for human overt attention. On the other hand, these results also demonstrate that global image properties, which correlate to the subjective impression of a scene being "natural,” affect the guidance of human overt attentio
Using binocular rivalry to tag foreground sounds: Towards an objective visual measure for auditory multistability
In binocular rivalry, paradigms have been proposed for unobtrusive moment-by-moment readout of observers' perceptual experience (“no-report paradigms”). Here, we take a first step to extend this concept to auditory multistability. Observers continuously reported which of two concurrent tone sequences they perceived in the foreground: high-pitch (1008 Hz) or low-pitch (400 Hz) tones. Interstimulus intervals were either fixed per sequence (Experiments 1 and 2) or random with tones alternating (Experiment 3). A horizontally drifting grating was presented to each eye; to induce binocular rivalry, gratings had distinct colors and motion directions. To associate each grating with one tone sequence, a pattern on the grating jumped vertically whenever the respective tone occurred. We found that the direction of the optokinetic nystagmus (OKN)—induced by the visually dominant grating—could be used to decode the tone (high/low) that was perceived in the foreground well above chance. This OKN-based readout improved after observers had gained experience with the auditory task (Experiments 1 and 2) and for simpler auditory tasks (Experiment 3). We found no evidence that the visual stimulus affected auditory multistability. Although decoding performance is still far from perfect, our paradigm may eventually provide a continuous estimate of the currently dominant percept in auditory multistability
The relation of phase noise and luminance contrast to overt attention in complex visual stimuli
Models of attention are typically based on difference maps in low-level features but neglect higher order stimulus structure. To what extent does higher order statistics affect human attention in natural stimuli? We recorded eye movements while observers viewed unmodified and modified images of natural scenes. Modifications included contrast modulations (resulting in changes to first- and second-order statistics), as well as the addition of noise to the Fourier phase (resulting in changes to higher order statistics). We have the following findings: (1) Subjects' interpretation of a stimulus as a “natural” depiction of an outdoor scene depends on higher order statistics in a highly nonlinear, categorical fashion. (2) Confirming previous findings, contrast is elevated at fixated locations for a variety of stimulus categories. In addition, we find that the size of this elevation depends on higher order statistics and reduces with increasing phase noise. (3) Global modulations of contrast bias eye position toward high contrasts, consistent with a linear effect of contrast on fixation probability. This bias is independent of phase noise. (4) Small patches of locally decreased contrast repel eye position less than large patches of the same aggregate area, irrespective of phase noise. Our findings provide evidence that deviations from surrounding statistics, rather than contrast per se, underlie the well-established relation of contrast to fixation
The world from a cat's perspective - statistics of natural videos
Abstract.: The mammalian visual system is one of the most intensively investigated sensory systems. However, our knowledge of the typical input it is operating on is surprisingly limited. To address this issue, we seek to learn about the natural visual environment and the world as seen by a cat. With a CCD camera attached to their head, cats explore several outdoor environments and videos of natural stimuli are recorded from the animals' perspective. The statistical analysis of these videos reveals several remarkable properties. First, we find an anisotropy of oriented contours with an enhanced occurrence of horizontal orientations, earlier described in the "oblique effect” as a predominance of the two cardinal orientations. Second, contrast is not elevated in the center of the images, suggesting different mechanisms of fixation point selection as compared to humans. Third, analyzing a sequence of images we find that the precise position of contours varies faster than their orientation. Finally, collinear contours prevail over parallel shifted contours, matching recent physiological and anatomical results. These findings demonstrate the rich structure of natural visual stimuli and its direct relation to extensively studied anatomical and physiological issue
- …
