65

by

in

Evidence Here is a simple initial question: how fast is the consolidation of information from perception into VWM? If there is a format discontinuity between the two systems, then there will be a translation 2 A related disagreement concerns whether VWM representations have a modality-specific format that relies on activation in visual areas (Harrison & Tong 2009; Carruthers 2015) or an amodal format that relies on frontal areas (Xu 2017). T his debate somewhat crosscuts my concerns here, since I want to argue that visual perception itself generates discursive representations and these are held in VWM without alteration in their format—I’ll leave neural implementation questions unaddressed here. 4 process that (ceteris paribus) will take extra time. So the quicker the consolidation, the stronger our credence should be that VWM represents in the same format as perception. Furthermore, Block notes (p.153) that the imagery literature suggests that transitions between discursive and iconic formats (e.g., going from the discursive symbol DOG to forming a mental image of a dog) are slow, around 1.5 seconds. We thus have a rare achievement in philosophy on our hands: an eminently testable question. Fortunately, there is clear evidence. Vogel et al. (2006) measured the time-course of VWM consolidation using backward masks—visual noise presented shortly after a stimulus to disrupt visual processing (Fig. 2). For example: if (i) a stimulus is presented, (ii) the mask appears 584ms later, and (iii) participants then fail to detect a color change in the item, then that is evidence they failed to get color information from perception to VWM within 584ms. Vogel et al. used different stimulus onset asynchronies (SOA) to see how much information was consolidated into VWM at different times. T hey then used change detection performance to estimate VWM capacity (represented with a K) at different SOAs. T hey found that, in addition to the time it takes to form a perceptual representation of a colored square (note the y-intercept in Fig. 3), the SOA needed to increase by only 50ms per item for VWM storage. In other words, the time-course of consolidating one perceived item into VWM happens in 50ms. Furthermore, in an earlier paper using a different paradigm, Gegenfurtner and Sperling (1993) found that, when participants are spatially attending to a particular location in the display, items at that location can be consolidated in half that time (20-30ms). These durations are stunningly fast: an order of magnitude faster than a blink of an eye (100-400ms) and nearly two orders of magnitude faster than the cross-format translation processes Block cites (1000-1500ms). Thus the answer to our first question is that VWM consolidation is fast, likely too fast to involve cross-format translation. 5 Figure 2—VWM task; Vogel et al. 2006. Figure 3—VWM capacity as a function of target-mask SOA; Vogel et al. 2006. If there really is format continuity between VWM and vision, then we should not only see fast consolidation. We should also see fluid interaction between the two systems, making shared use of a common representational format. We therefore have another tractable question: does vision access VWM contents in its own computations, as if they share the same kinds of representations? Or does 6 vision simply send information “upwards” to VWM and then lose access to that information? What we’re looking for here are unambiguously perceptual effects that show direct sensitivity to VWM contents. T here is, perhaps unsurprisingly, a large body of experimental work studying the interaction between vision and VWM. For example, VWM contents that match visual perception (e.g., have the same orientation) enhance the precision of visual experience (Salahub & Emrich 2016). However, we want to carefully pull apart visual processing proper from manipulation of visual representations in VWM itself. The processes that underlie precision of feature report, for example, could use VWM resources. And according to Block, visual experience is often conflated with visual cognition, including on cognition-based theories of consciousness (e.g., Global Workspace Theory—Baars 1993; Dehaene 2014) that take VWM to bear a special relationship to consciousness.3 Fortunately, Block details genuinely perceptual effects (Ch.2), such as popout. In a popout effect, a salient feature captures attention in a way that is barely diminished at all by an increase in the number of distractor items (Fig. 4). Hyun et al. (2009) had participants hold a display in VWM and then detect a salient change in color or orientation; they found an increase in amplitude of the N2pc event-related potential that remained constant as set size increased, just as in simultaneous popout displays (Luck & Hillyard 1994). Another clearly perceptual effect (not on Block’s list, but in the spirit of it) is motion repulsion: when you view two streams of dots moving in two different directions, the motions “repel” each other, and you see the angle separating them as larger than it actually is. Amazingly, holding one stream in VWM for two seconds is sufficient to cause participants to perceive the second stream as “repelled” away from the memorized stream (Kang et al. 2011). Figure 4—Popout: the light circle draws attention in both displays, despite the right display having twice as many items as the left. VWM might often affect visual perception by guiding visual attention (e.g., in the popout case), which is different from direct computational access (Quilty-Dunn 2020a). But attentional shifts 3 As Block puts it, “the global workspace model [is] a much better model of conceptualization of perception than of perceptual consciousness” (p.425). 7 are not the only source of these effects. Scocchia et al. (2013) found that perception of an ambiguous motion display is biased in the direction of a motion display held in VWM, but not in the direction of a recently attended motion display. Mendoza et al. (2011) had participants indicate whether motion of dot displays matched a sample; the sample was either concurrently presented (attention) or shown seconds earlier (VWM); finally, while participants performed that task, another dot display “pulsed” once with motion and participants had to indicate the direction of motion. When the motion pulse matched the attended/memorized sample, participants were better at detecting its direction. They were even better when the sample was both memorized and attended than when it was merely attended, showing that the effect of VWM is not due to attention alone. These vision-VWM interactions are f luid and bidirectional (see Teng & Kravitz 2019, which I lack space to discuss). Recall that, for Block, object representations in perception should be sharply distinguished from VWM representations. We can therefore ask whether VWM contents drive object-based effects that Block agrees probe perceptual object representations. One such effect Block appeals to is apparent motion, the visual impression of an object moving from one location to another created simply by seeing two objects appear one after another in different locations. Hein et al. (2021) used a Ternus display, in which the apparent motion is ambiguous between three objects moving together (group motion) or two objects staying in place and a third object “leapfrogging” from one end to the other (element motion). They colored the objects such that one color was consistent with group motion (e.g., the green object is in the middle in both displays) and another was consistent with element motion (e.g., the pink object is on the left in the first display and on the right in the second). They then had subjects hold a color in VWM for an unrelated memory task. The VWM color drove apparent motion: if they memorized green and the green Ternus item suggested group(/element) motion, then they saw group(/element) motion. Thus VWM representations drive the very perceptual effects Block uses to distinguish perceptual object representations from VWM object representations. Taking stock: VWM representations are consolidated from perception in tens of milliseconds and enter directly into perceptual computations. The evidence supports the hypothesis that the function of VWM is to sustain the outputs of perception without transforming their format, rendering them available for perceptual computations that take place over longer intervals of time than earlier sensory memory stores can robustly manage. 4. Hybrids in VWM? As mentioned above, Block sometimes describes VWM representations as “conceptualized versions” (p.256) of perceptual representations. However, he also suggests that VWM representations are “perceptual” (p.113) and contain “perceptual materials” (p.258) or “remnants of perception” (p.260) enclosed in a “cognitive envelope” (p.249). In that case, perhaps his view is that some of the iconic outputs of perception are simply held as such in VWM (these are the perceptual materials), but are accompanied by discursive representations (this is the cognitive envelope). Many tractable questions 8 are raised here, such as: are the iconic elements of VWM representations limited to a certain range of properties, e.g., shape and color?4 Is there redundant representation of properties, such that some are encoded both iconically and discursively? How do the formats interact? Does memory consolidation work differently for both sorts of properties? Given the speed of VWM consolidation, Block could say (1) that consolidation actually has two phases: first, iconic outputs of perception are transferred to VWM in tens of milliseconds; second, a slower process adds discursive constituents. He could also argue (2) that only the iconic elements of VWM enter into perceptual computations, with the discursive elements playing a purely cognitionfacing role. I know of no evidence that directly probes (1). However, it is problematic to suppose that the abstract, categorical properties represented in object files take significantly longer to form than other properties. Mandelbaum (2018) argues that visual categorization happens in tens of milliseconds, citing studies like Potter et al. 2014, in which images of scenes are shown one after the other for 13ms each and therefore masking each other. Block (pp.329-330) argues that these images are not effective masks, citing Maguire & Howe’s (2016) follow-up study showing that lower-level stimuli (lines/edges) are better masks at these presentation times and eliminate evidence of rapid categorization. However, our question here is whether encoding conceptual categories is significantly slower than encoding low-level features. In a follow-up to the follow-up, Howe (2017) found that the minimal presentation time for categorization was about the same as that for detecting color and orientation (~35ms). It’s implausible that consolidation of discursive representations into VWM is significantly slower than consolidating iconic elements. What about (2), the prediction that vision only accesses iconic elements of VWM representations? The best evidence to the contrary concerns transsaccadic memory, i.e., the visual system’s ability to maintain a coherent percept across eye movements without having to restart visual perception from scratch with each new fixation. As I’ve argued elsewhere, abstract categorical information in object files is preserved in transsaccadic memory (Quilty-Dunn 2020b, Section 5). Furthermore, the use of low-level properties like color in transaccadic memory is governed by these abstract categorical properties: e.g., color is used to track object identity across eye movements if the object category has a diagnostic color (banana) but not if it doesn’t (bucket) (Gordon & Vollmer 2010). Block argues that transsaccadic memory is essentially just VWM. But its significance for perception shouldn’t be dismissed on these grounds. Visual perception—not just cognition, but visual perception itself—needs coherence across eye movements. And as we’ve seen, vision seems to access VWM contents regularly. So why deny that object files in transsaccadic memory/VWM play this foundational role in securing a coherent visual percept in spite of our constantly moving eyes? Block points to the possible role of long-term memory in transsaccadic memory, but this might simply be 4 Block does argue for “deep differences between perception and the perceptual materials used in working memory” (p.16) even for color. 9 visual long-term memory, which is high-fidelity and arguably distinct from the “long-term memory” where beliefs and other cognitive states are held (Brady et al. 2008). T he following view appears increasingly plausible: our visual systems construct representations in various formats, including discursive object files; some of these (including object files) are held in VWM to be re-used in visual processing, including transsaccadic processes, and also to subserve cognition. If this view were true, many of the insights in Block’s book about iconic format in perception would remain unaffected. The only cost would be the claim that there are no other formats in perception.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *