Efficient coding -- the idea that a sensory system is tuned to the statistics of its natural inputs -- has been a successful framework for understanding early visual processing in terms of first- and simple second-order statistics in natural scenes. However, it is as of yet unclear whether, and to what extent, higher-order correlations shape neural processing beyond the periphery. Yet we know that these correlations are important for perception; removing them from an image makes the image unrecognizable. We are working to understand how higher-order image features shape neural processing beyond the sensory periphery.
Previous work found that some spatial configurations of local correlations are informative, while others are not. We focus on one particular informative configuration, defined by the set of four pixels arranged in a 2x2 square. While there are 16 possible ways to color this 2x2 square with binary pixels, translation invariance imposes constraints that reduce the number of independent degrees of freedom to 10. These 10 degrees of freedom can be parametrized by a set of coordinates that are defined in terms of the likelihood of sampling different colorings from an image patch. These coordinates include one first-order coordinate describes luminance, four second-order coordinates describe pairwsie correlations, four third-order coordinates describe three-point correlations, and one fourth-order coordinate describes a single four-point correlation. These coordinates can be extracted from natural scenes, or they can be used to generate synthetic textures with prescribed correlational structure.
We work with a database of natural scenes that was originally collected in the Okavango Delta in Botswana and is freely available online here.
In order to extract the statistical structure of natural scenes, we first preprocess and binarize an ensemble of natural image patches. For each patch in this ensemble, we then collect a histogram of binary colorings, and we use this histrogram to compute values for each of the 10 coordinates described in the previous section. This assigns each image patch a coordinate value in the 10-dimensional space of image statistic coordinates. When repeated across patches, this process yields a multidimensional distribution of coordinate values that describes the local correlational structure in the ensemble of image patches.
Just as local correlations can be extracted from natural scenes, they can also be used to generate synthetic textures with prescribed correlational structure. Such textures can be generated by tuning individual coordinates, or by jointly tuning pairs of coordinates (and in principle, larger combinations, although this quickly becomes intractable). These synthetic textures can then be used as stimuli in psychophysical studies of visual sensitivity.
Collaborators at Weill Cornell Medical College have an extensive body of work in which they probe human sensitivity to local correlational structure in a figure/ground segmentation task. In this task, figure and ground are distinguished by their local correlational structure; either a patch of correlated texture is placed on a white noise background, or vice versa. The task consists of a brief (120ms) presentation of a stimulus, followed by a white-noise mask. Subjects are then asked to identify the location (top, bottom, left, or right) of the target. After many such judgements, one can measure the fraction of correct responses as a function of the strength of the image statistic used to generate the target/ground. Sensitivity is measured as the value of the image statistic for which performance is halfway between chance and perfect.
We recently showed that the distribution of second-, third-, and fourth-order correlations extracted from natural scenes can robustly predict human sensitivity to the same correlations. Specifically, high variability in image statistics predicts high sensitivity to the same statistics. This relationship can be understood in terms of the efficient coding principle operating in a regime in which input noise is dominating, and predicts that optimal sensitivity should increase with the variability of the features to be encoded.
While we find a consistent match between image statistic variability and human sensitivity, there are asymmetries in the natural image distributions that are not mirrored in human sensitivity. These differences suggest a set of possible constraints on the underlying mechanisms that encode these features. To better these differences, we are developing mechanistic models of resource allocation subject to such constraints.
The image statistic coordinates that we use to paramatrize natural scenes must be sampled from a spatial region of an image. As such, the estimation of these coordinates depends on the size of the region being sampled. If the sampled region is too small, it will be corrupted by sampling error. If the sampled region is too large, estimates will average over local variations that could be important for determining boundaries between different textures. One might expect that there is an optimal region size for reliably estimating statistics while still resolving spatial variability. To better understand this, we are working to characterize the spatial- and scale-dependence of sampling variations.
Preliminary analyses confirm, as one might expect, that coordinate estimates change more abrupty across a boundary between two textures that within a patch of a given texture. This suggests that such abrupt changes could be used to identify contours or object boundaries. This requires differentiating between noisy sampling fluctuations and reliable boundary-related fluctuations in the underlying distribution of correlations. We are currently exploring these ideas in a an effort to characterize higher-order image features, such as objects and contour fragments.