Dicky N. Sihite

Learn More
Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task and 2) bottom-up factors that highlight image regions that are different from their surroundings. The latter are often referred to as “visual(More)
Significant recent progress has been made in developing high-quality saliency models. However, less effort has been undertaken on fair assessment of these models, over large standardized datasets and correctly addressing confounding factors. In this study, we pursue a critical and quantitative look at challenges (e.g., center-bias, map smoothing) in(More)
Despite a considerable amount of previous work on bottom-up saliency modeling for predicting human fixations over static and dynamic stimuli, few studies have thus far attempted to model top-down and task-driven influences of visual attention. Here, taking advantage of the sequential nature of real-world tasks, we propose a unified Bayesian approach for(More)
Several visual attention models have been proposed for describing eye movements over simple stimuli and tasks such as free viewing or visual search. Yet, to date, there exists no computational framework that can reliably mimic human gaze behavior in more complex environments and tasks such as urban driving. In addition, benchmark datasets, scoring(More)
One challenge when tracking objects is to adapt the object representation depending on the scene context to account for changes in illumination, coloring, scaling, etc. Here, we present a solution that is based on our earlier approach for object tracking using particle filters and component-based descriptors. We extend the approach to deal with changing(More)
Einhäuser, Spain, and Perona (2008) explored an alternative hypothesis to saliency maps (i.e., spatial image outliers) and claimed that "objects predict fixations better than early saliency." To test their hypothesis, they measured eye movements of human observers while they inspected 93 photographs of common natural scenes (Uncommon Places dataset by(More)
Eye tracking has become the de facto standard measure of visual attention in tasks that range from free viewing to complex daily activities. In particular, saliency models are often evaluated by their ability to predict human gaze patterns. However, fixations are not only influenced by bottom-up saliency (computed by the models), but also by many top-down(More)
Our models estimate the state of a human subject performing a task (here, playing video games), and map that state to an eye position. Factors influencing state come from scene gist, physical actions, events, and bottom-up saliency. Proposed models fall into two categories. In the first category, we use classical discriminative classifiers, including(More)
We introduce a new task-independent framework to model top-down overt visual attention based on graphical models for probabilistic inference and reasoning. We describe a Dynamic Bayesian Network (DBN) that infers probability distributions over attended objects and spatial locations directly from observed data. Probabilistic inference in our model is(More)
We compare 32 saliency models. Details of our model comparison setup and compared models are provided in this supplement. We report eye movement prediction accuracy of models using three scores: Shuffled AUC (sAUC), Correlation Coefficient (CC), and Normalized Scanpath Saliency (NSS). Additionally, we show results of statistical significance (statistical(More)