Files
Abstract
Everything that the brain sees must first be encoded by the retina, which maintains a reliable representation of the visual world in many different, complex natural scenes while also adapting to stimulus changes. Decomposing the population code into independent and cell-cell interactions reveals how broad scene structure is encoded in the adapted retinal output. By recording from the same retina while presenting many different natural movies, we see that the population structure, characterized by strong interactions, is consistent across both natural and synthetic stimuli. We show that these interaction contribute to encoding scene identity, and demonstrate that leveraging this underlying interaction network improves scene decoding. This population structure likely arises in part from shared bipolar cell input as well as from gap junctions between retinal ganglion cells and amacrine cells. Separately, we use a task-agnostic deep architecture, and encoder-decoder, to model the retinal encoding process and characterize its representation of `time in the natural scene' in a compressed latent space. In this end-to-end training, an encoder learns a compressed latent representation from the retinal ganglion cell population, while a decoder samples from this latent space to to generate the appropriate future scene frame. By comparing latent representation of retinal activity from three natural movies, we find that the retina has a generalizable encoding for time in natural scenes, and that this encoding can be used to decode future frames with up to 17ms resolution. Lastly, we explore methods to efficiently scale small population models up to a large population using an aggregate approach.