CHAPTER TWENTY-SEVEN Let’s Get Rid of the Concept of an Object File Ned Block In a typical vision textbook you will see the term “object file” defined as follows: “An object file is a visual representation that “sticks” to a moving object over time on the basis of how and where that object moves, and stores (and updates) information about what that object looks like” (Scholl and Flombaum, 2010, p. 655). Object files are said to function in working memory (Green and Quilty- Dunn, 2021; Quilty- Dunn and Green, 2021) and to ground singular thought (Murez and Recanati, 2016). One claim of this article is that although thought and working memory often preserve some perceptual information, what are called the object files of both singular thought and working memory are fundamentally different from what are called the object files of perception. Indeed, there is reason for doubt that the object files of perception can even ground singular thought. The object files of working memory and singular thought enclose the perceptual materials from perceptual object files in a cognitive envelope and in addition transform the perceptual information, often misrepresenting some aspects of the stimulus in order to make other aspects of the stimulus easier to use for a specific task. That is the problem for grounding singular thought. A second thesis of this article is that the object files of perception (that is, perceptual object representations) are iconic in format, contrary to the claims of the “pluralists” who take them to be discursive (Quilty- Dunn, 2020b). The object files of thought and working memory by contrast are conceptual and partly discursive. The term “object file” ambiguously denotes two fundamentally different kinds of entities. We would be better off without the term. Terminology: I will often call the object files of perception “perceptual object representations” and the object files of working memory “working memory object representations,” though I also use the term “object file” when making contact with other writers who use that term. Contemporary Debates in Philosophy of Mind, Second Edition. Edited by Brian P. McLaughlin and Jonathan Cohen. © 2023 John Wiley & Sons Ltd. Published 2023 by John Wiley & Sons Ltd. 0005464032.INDD 494 09-10-2022 11:32:041. Perceptual Object Representations Are Iconic I will start with the second of the two theses just mentioned, that perceptual object representations are iconic in format. There is a great deal of evidence for the iconic nature of all perceptual representations, including representations of object- perception. I won’t go over the evidence for perceptual representations that are not object- representations, since it isn’t controversial For reviews of some of the evidence about the iconic nature of perceptual representation, see Chapter 5 of (Block, 2022), from which this article is derived, or Quilty- Dunn (2019). I am going to present two kinds of evidence for the conclusion that perceptual object representations are iconic, direct and indirect. But I’m not saying that these items of evidence “refute” the claim that perceptual object representations are discursive. Pluralists who combine discursive perceptual object representations with iconic perceptual representations of space and spatial features may be able to accommodate these results. The main issue is which view better explains the data, my view that all perceptual representations are iconic or the pluralist view that some are and some are not. One line of direct evidence for the iconic nature of object perception exploits apparent motion, a phenomenon discovered in the early twentieth century (Wertheimer, 1912). Apparent motion occurs if a subject is shown A in Figure 27.1, followed by B, then A again, then B again, and so on. Subjects report seeing motion. At high rates of flicker between A and B, motion will be seen without intermediate stages. (This is called “phi.”) At slower flicker rates, subjects see the trajectories of the moving objects with intermediate stages clearly visible. Subjects report seeing objects of one color or shape transforming into objects of another color or shape. (That phenomenon is called beta motion.) It should be said that subjects do not confuse apparent motion with real motion, but apparent motion still looks like motion Sperling et al., 1985). Most subjects will see the motion in D in Figure 27.1 rather than the motion in C because the primary determinant of the motion is the visual system’s drive to minimize the distance between the items. The effect on apparent motion of path length (a) (b) (c) (d) FIGURE 27.1 IF A AND B ARE QUICKLY ALTERNATED, ONE SEES APPARENT MOTION, USUALLY AS DEPICTED IN D. THANKS TO SUSAN CAREY FOR THE FIGURE (CF. CAREY, 2009, P. 73). How Should We Understand the Distinction 0005464032.INDD 495 495 09-10-2022 11:32:05has been estimated to be 15 times the strength of the effect of the shapes of the items involved (Flombaum and Scholl, 2006). The visual system “prefers” not to see a bird turning into a rabbit, but that “preference” is balanced against the stronger “preference” for shorter distances of motion. (This is sometimes called the principle of spatiotemporal priority.) So, subjects will see a bird crossing the screen from left to right, gradually changing into a rabbit at the top right, and the opposite transformation on bottom. The larger the difference between the paths, the more likely the subject is to see the shorter motion (Nakayama et al., 1995). However, if the paths are roughly equal, shape counts. Path length and shape work together in an integrated manner. The direction of motion depends in a smooth way on the distance between the items. See Figure 27.2 in which the gradual nature of this type of transition is graphed. The gradual transitions are indicative of the analog mirroring of iconic representation. The integration of smoothly varying spatial factors with factors involving object representations suggests that these are not fundamentally different kinds of representations, as would be expected if object representations in perception are discursive whereas other representations are iconic. It would be possible to combine discursive representation of objects with a spatiotemporal representation system, but to the extent that spatial and spatiotemporal effects saturate object representations, that view is less attractive. The apparent motion stimuli just described are ambiguous in the sense that there are two very different representations that the visual system will compute in different contexts. When stimuli are ambiguous in this sense, cognitive and conceptual factors can affect which representation the visual system computes. This is a ubiquitous kind of cognitive penetration. So, one should not be surprised if cognitive information influences which kind of motion the subject sees. (percent) perceived horizontal motion 100 50 0 horizontal distance FIGURE 27.2 THE LIKELIHOOD OF SEEING HORIZONTAL (RATHER THAN VERTICAL MOTION) IN APPARENT MOTION DISPLAYS THAT ARE VARIANTS OF THE ONE IN THE PREVIOUS FIGURE. THE HORIZONTAL AXIS SHOWS HORIZONTAL DISTANCE, WHEREAS THE VERTICAL AXIS GRAPHS THE LIKELIHOOD OF PERCEIVING HORIZONTAL MOTION RATHER THAN VERTICAL MOTION (FOR EXAMPLE, THE BAT ON THE TOP LEFT TURNING INTO THE RABBIT ON THE TOP RIGHT AND THE CORRESPONDING TRANSFORMATION ACROSS THE BOTTOM OF THE SCREEN). WHAT THE GRAPH SHOWS IS THAT AS HORIZONTAL DISTANCE GETS GREATER, SUBJECTS ARE LESS LIKELY TO SEE HORIZONTAL MOTION. FROM NAKAYAMA ET AL. (1995). THANKS TO KEN NAKAYAMA FOR THIS FIGURE. 496 0005464032.INDD 496 Ned Block 09-10-2022 11:32:06Y1 perceived translation and rotation Y2 FIGURE 27.3 A CLOCKWISE ORIENTED BAR CAN BE SEEN TO ROTATE TO A COUNTERCLOCKWISE ORIENTED BAR IN APPARENT MOTION. FROM NAKAYAMA ET AL. (1995). THANKS TO KEN NAKAYAMA FOR THIS FIGURE. So why does apparent motion constitute evidence for iconic object- seeing as opposed to just iconic seeing of shapes? One relevant manipulation uses pairs of white bars that protrude from their black background and differ in orientation by 90o between the left and right displays, as in Figure 27.3. Subjects see the bars as rotating back and forth (instead of birds changing into rabbits). Note that the bars appear to rotate gradually. That is, the subject sees the intermediate orientations. The fact that subjects see intermediate stages of rotation suggests that the representations are part of a system that mirrors rotation operations on actual objects— again the analog mirroring characteristic of iconic representation. See Figure 27.3. The display is viewed via an apparatus that allows for independent manipulation of what is sent to each eye. Whether the white bars emerge from the background in the manner of objects is manipulated by changing binocular disparity cues. If the bars look like parts of a squarish shape instead of like protruding objects, then there is a visual experience of vertical motion but no visual experience as of rotation (Nakayama et al., 1995). If there is no apparent object, then there is no rotation. To the extent that shapes are involved, they are not 2D shapes, since the 2D outline is the same whether or not the display looks like parallel bars. What makes these representations perceptual is that the bars look like they are moving and rotating. What suggests they are iconic is the presence of smoothly varying intermediate stages of rotation and translation. In his contribution to this volume, E.J. Green argues that any theory— iconic or noniconic— would have to predict the apparent motion observed by Nakayama, so it provides no evidence for the iconic theory. But flickering images of the sort used by Nakayama need not produce apparent motion; indeed, if the flicker rate is sufficiently high or low, or the distance is too great, there is no apparent motion. Further, in some conditions, the first stimulus will be seen as expanding into the second stimulus. And the greater the distance between the two stimuli, the longer the time gap required to see motion, mirroring typical speeds in the actual world (Korte’s Law). The fact that the expansion, rotation, and translation is observed at all— in any circumstance— is not surprising on the iconic account but is surprising on the discursive account. How Should We Understand the Distinction 0005464032.INDD 497 497 09-10-2022 11:32:06To avoid misunderstanding, I am not saying that iconic and discursive elements cannot be combined in a single representation. An iconic depiction of the shape of a street can be combined on a paper map with the name of the street. But notice that this is possible because the name itself has spatial properties: its location, orientation, and size. Indeed, the name of Doyers Street, a storied 200 foot long curved street in Chinatown in southern Manhattan, is often curved like the street on maps of Manhattan. A paper map of Manhattan uses spatial properties and relations instantiated on the paper to represent spatial properties and relations on the island of Manhattan, but brain representations do not represent space with space. Will the advocates of discursive perceptual object representations say that the putative discursive object- representations brain representations have an analog of spatial properties comparable to the spatial properties of the name “Doyers Street”? We can call this suggestion the Doyers Street gambit. A brain map of Manhattan uses an analog of spatial extent realized in the place cells and grid cells of the brain. Although some startling discoveries have been made recently about place cells and grid cells, how the brain represents space is still largely a mystery. But we can refer to the analog of space in the brain as “place–grid–cell–space.” Advocates of the Doyers Street gambit could say that the discursive object representations also instantiate place–grid–cell–spatial properties, just as the name “Doyers Street” instantiates real spatial properties. This is an interesting and adventurous hypothesis, but I know of no evidence for it. A defender of the Doyers Street gambit might say that the Nakayama result just described is evidence for it. To take this claim seriously we would need independent evidence for both the Doyers Street gambit and discursive perceptual object representations. As we will see in the second half of this article, the evidence that has been offered for discursive object representations applies to working memory, not perception. The apparent motion results are direct evidence for the iconicity of object perception because they exhibit the smooth variation indicative of analog mirroring. I now turn to indirect evidence that perceptual object representations are iconic. More specifically, I will consider evidence that object representations in perception are so tightly integrated with other iconic representations in perception, notably spatial representations, as to put pressure on pluralism. The Doyers Street gambit is one way of resisting that pressure, but perhaps there are others. The first type of evidence I will consider involves object- based attention. (See Scholl, 2001 for a review.) Perceptual attention can be divided into three types, depending on what is attended to: object- based attention, in which what is attended to is an object; spatial attention, in which what is attended to is a region of space; and feature- based attention, in which what is attended to is a property of objects or regions of space. The word “attention” is used in many different ways, including speaking of attention to items that cannot be perceived directly. But the kind of attention being discussed here is perceptual in that it is tightly integrated into perceptual systems and it obeys perceptual regularities such as a phenomenon known as divisive normalization (Bloem and Ling, 2019). I’ll give an example below. Subjects show faster and more accurate processing for features belonging to the same object than for features belonging to different objects, showing that perceptual object representations are involved in the control of attention. One type of experiment that shows this is illustrated in (a) in Figure 27.4. If subjects see a cue at C, they are 498 0005464032.INDD 498 Ned Block 09-10-2022 11:32:06(a) (b) C D + S C D S FIGURE 27.4 IF CUED TO C, SUBJECTS ARE FASTER TO DETECT TARGETS AT S THAN AT D EVEN THOUGH S AND D ARE EQUIDISTANT FROM C. THANKS TO BRIAN SCHOLL FOR THIS FIGURE. SEE SCHOLL (2001). faster at detecting a target on the same object at S (for “same”) than an equidistant target on another object, D (for “different”). And this holds whether or not there is an occluder, as in (b). The fact that even an occluded object is subject to object- based attention indicates that the subjects are seeing the occluded objects as objects. This is not in itself evidence for iconicity, but that is coming in the next paragraph. Here is the evidence for iconicity: Object- based attention is a matter of degree. Objects such as the vertical rectangles of Figure 27.4 show less of an object effect if the rectangles are altered so as to be less “good” as objects, for example if the bottom horizontal bar of the rectangle is deleted (Marino and Scholl, 2005). If there was a radical format difference between object- perception and other perception, one would not expect such gradual effects. The difference between discursive and iconic representation is not a matter of degree. Pluralists may postulate links between spatial attention and discursive object representations. But results of the kinds just described put pressure on them to justify the extra assumptions involved in such explanations. Another feature of object- based attention that should trouble pluralists is that attention “spreads” within an object from a cue at one end of the object (as in Figure 27.4) (Richard et al., 2008; Zhao et al., 2013). Spreading suggests representational analogs of the spatial extent of the object that mirror the spatial properties of the object. A similar point about the integration of perceptual object representations with spatial representation applies to a phenomenon known as inhibition of return. Inhibition of return was demonstrated in a paradigm in which there are three boxes, a central box and two flanking boxes. One of the flanking boxes (say the one on the right) is cued (e.g., it suddenly brightens), so attention is drawn to it. Then the central box is cued. If a target is presented in the right box within 150 ms, there is a detection advantage (due to the residual attention to the right box), but if a target is presented in the right box after 300 ms, there is a disadvantage in detection. The upshot— now verified in many paradigms— is that the attention system is inhibited from attending to something that has recently been attended for as long as 3 seconds. But what is that something? Is it an area of space, a scene, an object, or what? The answer is areas of space and objects both show inhibition of return, not surprising since there is both object- based attention and spatial attention. The object- based effect is exhibited when what is inhibited is a return of attention to the object in which the cue originally occurred (Tipper et al., 1999). This is verified by varying other properties such as location, showing an independent effect of the same object. Object perception and spatial perception function similarly, a puzzling fact if they are fundamentally different in format.
Leave a Reply