Remnants of Perception: Comments on Block and the Function of Visual Working Memory Jake Quilty-Dunn (Rutgers) Word count: 3483 (text, incl. figure captions & footnotes) + 771 (references) T he Border Between Seeing and Thinking is an extraordinary achievement, the result of careful attention (and contribution) to both the science and philosophy of perception. The book offers some bold hypotheses. While the hypotheses themselves are worth the price of entry, Block’s sustained defense of them grants the reader insight into countless fascinating experimental results and philosophical concepts. His unpretentious and accommodating exposition of the science—explaining rather than asserting, digging into specific results in detail rather than making summary judgments and demanding that readers take him at his word—is a model of how philosophers ought to engage with empirical evidence. It is simply not possible to read this book without learning something. It will surely play a foundational role in theoretical work on perception for many years to come. 1. The Perception–Cognition Border At the center of the book’s positive account of perception is Block’s claim about the nature of visual representation, and how it differs from thought: the visual system represents the world the way an image does, and thought takes myriad forms, including forms more like language. More technically, vision is entirely iconic and cognition is paradigmatically (perhaps mostly?) discursive.1 Block provides extensive elaboration and defense of this claim throughout the book (especially Chapters 4 through 8), so in what follows I will assume the basic idea and evidence base are well-understood. For this sort of representational approach to the perception-cognition border, any interface point between perception and cognition provides a key test case. Does the way perception and cognition interact seem to suggest that they speak different languages and rely on some intermediating translation? Or do they use a common interlingua that allows for free transfer of information? Of course, there are plainly significant limits on information transfer between perception and cognition. Your full belief that your friend is currently in Belgium doesn’t prevent your visual system from perceiving their face on a street corner in Manhattan; your knowledge that footprints tend to be concave and that Figure 1 is lit from below doesn’t (invariably, stably) prevent your visual system from using the “light-from-above prior” to see an unusual raised foot-shape; your belief that objects can 1 For recent work on the discursive format of cognition, see Dehaene et al. 2022; Carcassi & Szymanik 2023; Kazanina & Poeppel forthcoming; Quilty-Dunn et al. forthcoming a, forthcoming b, and commentaries. 1 persist as separate individuals despite being connected by a dotted line doesn’t allow you to track them efficiently as separate individuals (Scholl et al. 2001); and so on for many visual phenomena. Figure 1. Concave footprint lit from below that might appear as a convex foot-shape lit from above; photo by Elviss Railijs Bitāns. T hese failures of information transfer might stem from something other than a difference in representational format. They might instead stem from built-in limitations on information access. On modular approaches to perception, visual processes have access to their own stores of information, which might include the light-from-above prior but exclude cognitive information like your beliefs about illusions (Fodor 1983; Mandelbaum 2018; Quilty-Dunn 2020a). It’s compatible with these approaches that vision represents in whatever format you like, including the very same format as propositional thought. In that case, it’s not the fact that our beliefs are represented in a different format that prevents their free usage by perceptual processes—it’s the fact that they’re stored in the wrong place in memory. If we were able to move them to a memory location that a visual process had access to, then that visual process might be able to use them in the same way it uses the light-from-above prior. Modular approaches are a prominent example of architectural approaches to the perceptioncognition border. These approaches locate the distinction between seeing and thinking in architectural 2 features of processing and place few restrictions on the form of perceptual representations. They include the view that perceptual processing is distinguished by its relationship to proximal stimulation (Beck 2018), by restriction in the features it computes over (Green 2020), and by other potential architectural features such as the use of special algorithms. T hese views vary in the restrictions they place on information flow, with modular approaches typically positing the greatest restrictions. But they all allow (in principle) for pluralism about perceptual representation, which is arguably desirable given the many different computational demands of perception (Quilty-Dunn 2020b; Green 2023; Firestone & Phillips forthcoming). In particular, they allow that some perceptual processes might output discursive, conceptual representations. Thus the information flow from perception to cognition could proceed in those cases without translation. Representational approaches like Block’s, on the other hand, must deny that the information transfer from perception to cognition works this way. Promising test cases for the debate between these approaches are points at which perception hands information off to cognition. Our key question: does perception seem to hand information off in a way that suggests a common interlingua, or in a way that suggests a format discontinuity? We can ask more specific questions to probe this more general one, like: is the information transfer fast or slow? Are the same representations at the interface point usable for both cognitive and perceptual processes, or only ever one or the other? 2. Visual Working Memory My test case will be arguably the central interface point between seeing and thinking: visual working memory (VWM). VWM is a functional memory location where information that bears some important relation to visual perception can be maintained for brief intervals (on the order of seconds) and manipulated. For example, if you look at your cat and wonder whether he can make the jump to the high bookshelf he’s eyeing, your visual simulation of possible outcomes makes use of a visual representation of the scene that is stored in VWM. VWM is where focused thinking about the visual world begins. I say VWM representations “bear some important relation” to visual representations because the question in the rest of this paper will be: what relation? One possibility is that VWM simply takes the outputs of visual processing—visual percepts—and stores them. Another possibility is that VWM cannot access visual percepts directly, and instead uses representations in another (perhaps discursive, cognition-friendly) format that are merely informed by visual information. Consider the function of VWM. One view is that the function of VWM is largely for perception: it sustains information for use in perceptual computations that unfold over larger timescales. The subjective sense that we experience a unified visual scene across noticeable intervals of time—including movements of the eyes, body, and external objects that might unfold on the order of 3 seconds rather than milliseconds—could be grounded in the functional interaction between VWM and online vision. Another view is that the function of VWM is largely for cognition: it transforms visual representations into discursive, conceptualized representations that can be broadcast to other cognitive systems.2 If VWM is primarily for perception, then we should expect it to sustain information in a format that perception uses. And if VWM is primarily for cognition, then we should expect it to represent in a conceptualized discursive format. Note that, if perception uses a discursive format, both these functions can be naturally accommodated: the discursive format is apt for cognitive use and, since it is native to perceptual processing, can feed back down into perception for slower perceptual computations. One way into the function question, which I lack space to explore here, concerns nonhuman animals. If VWM is for perception, then it might evolve prior to the cognitive abilities humans happen to (re)use it for. The relation between perception and memory is ripe for philosophical analysis, and I hope more philosophers pursue it (see, e.g., Munton 2022; Green forthcoming). T he format of VWM and its relation to perception are discussed at length in The Border Between Seeing and Thinking. There is evidence that object-file representations, the representations we use to track objects and store information about their features, are discursive; thus, perception cannot be entirely iconic (Quilty-Dunn 2020b; Green & Quilty-Dunn 2021). Block argues, however, that object files are creatures of VWM and don’t reflect the format of perception. For Block, VWM representations like object files are “conceptualized versions” (p.256) of perceptual representations (though see the final section below). VWM itself is a “cognitive scratch pad” (p.250). While he does not explicitly characterize the function of VWM, I read him as taking VWM to be primarily for cognition. Perceptual object representations, which Block sharply distinguishes from object files, are studied through characteristic effects such as apparent motion (discussed below) instead of effects that are generally agreed to tap into VWM, such as the object-specific preview benefit. We can thus use our questions about the structure and function of VWM as an inroad into debates about the format of perception itself.
Leave a Reply