Object recognition is a multisensory process that involves different types of neural representation, with modality-specific sensory signals merging into a coherent object representation. For example, we can recognize a cup by seeing it, touching it, hearing someone tap on it, or listening to someone describe it with words. How these different sensory inputs merge into a supramodal object representation, and what the nature of such potentially supramodal representation is, remain to be fully understood.
Shape is a classic example of a presumed supramodal representation. It could be acquired and accessed by vision, touch, and language description, and there is rich evidence in the literature suggesting at least some shared cognitive and neural representations across these input modalities. For people with typical sight development who can perceive object shape with vision and touch simultaneously, the perceptual similarity space input from these two modalities was found highly correlated behaviorally, indicating a good alignment between them (Erdogan et al., 2015; Lee Masson et al., 2016). Neuroimaging studies supported this alignment by showing that both visual and haptic input of objects activated a common brain area in the lateral occipital cortex (LOC), the neural representation similarity space of vision and touch was correlated with the behavioral similarity space (Amedi et al., 2001, 2002; Erdogan et al., 2016; Lee Masson et al., 2016; Stilla & Sathian, 2008), and the neural representation could decode cross-modally (Erdogan et al., 2016; Tian et al., 2023). While such visual-tactile overlap observed in sighted individuals could be attributed to imagery, compelling evidence has also been obtained with congenitally blind individuals. Behavioral and neuroimaging studies showed that, without visual experience, blind people's behavioral patterns in terms of shape similarity ratings and sorting tasks were highly similar to those of the sighted people (Kim et al., 2019; Peelen et al., 2014). Such behavioral similarity was also supported by a similar neural basis. Visual ventral cortex shows similar activation profiles in sighted and congenitally blind individuals (e.g., Mahon et al., 2009; Pietrini et al., 2004). The LOC could be activated by touching or hearing visual-to-auditory sensory substitution soundscapes in both sighted and blind subjects (Amedi et al., 2007, 2010), the representation space in this area in the blind subjects showed a correlated structure with that of the pictorial form in the sighted subjects (Handjaras et al., 2016, Handjaras et al., 2017), and the neural representational similarity pattern was significantly correlated with the behavioral shape similarity ratings in both groups (Peelen et al., 2014; Xu et al., 2023). Finally, language experience also contributes to object property learning (Bi, 2021; Wang et al., 2020), and shape knowledge could be constructed at least partly based on language descriptions (see discussions in Kim et al., 2019).
Although these lines of evidence converge on a supramodal shape representation across visual, tactile, and language experiences, the evidence is predominantly based on (dis)similarity structures (e.g., a paintbrush is more similar in shape to a razor than to a pair of scissors), which may well result from different individual shape representations. Even similar representations that allow for cross-modal decoding might have different tuning properties (Breedlove et al., 2020; Favila et al., 2022). We do not know from the existing evidence whether there are any differences in how the congenitally blind, without vision, represent the shape of a cup compared to sighted people. In fact, past research has implicated a complex interaction among object properties (object identity vs shape), domain (animate vs artifacts), and information modality (visual vs nonvisual; sighted vs blind). Modality independence tended to be observed when shape information was accessed explicitly (input or output), for items where shape properties match with nonvisual properties more directly (e.g., when object sound conveys shape information in cases of sensory substitutional mapping or emotional expressions with systematic facial shape correspondences), relative to when object identity was accessed and/or when object shape properties do not match with other properties transparently (e.g., identity sound or vocal sound without transparent mapping with shape; Amedi et al., 2002, 2007; Bola et al., 2022; Mattioni et al., 2020; Wang et al., 2015; see discussions about interaction with object domains in Bi et al., 2016). Thus, an explicit assessment of object shape representations across different types of objects and tasks is warranted.
There are different possibilities about how multi-modality inputs work together to derive shape representations with different predictions. One is that there is a same supramodal representation that can be generated independently from visual or tactile inputs, i.e., either modality is sufficient, which predicts that the blind and the sighted have the same shape representation for objects with which they have full tactile experience. For example, touching or seeing a cup yields the same representation in the blind and the sighted. For objects that they have not touched (e.g., a rocket or an elephant), there are two possibilities. One possibility is that shape representations are thus impoverished in the blind; the other is that they can nonetheless learn the shape compositions from language descriptions and combine them with their tactile experiences of certain shape elements despite the lack of experience with the actual object. For instance, a person may learn from language that an elephant is a big mammal with a nose like a long tube and big ears like fans, forming a composite representation with the shape of tube and fans. This latter possibility accommodates the same shape representation for the blind and the sighted people even for those objects without direct tactile experiences. Finally, there is also a possibility that visual and tactile modalities have intrinsic differences in shape derivation. For instance, the visual modality tends to access object shape in parallel and is less constrained by size, while the tactile modality accesses shape more sequentially for objects larger than palm and is more tightly related to the manipulation and functional object properties. Having them available together in the sighted leads to convergent shape representations, but in the absence of one modality, the shape representation constructed from the remaining modality might differ. According to this hypothesis, even for objects with which they have rich tactile experience, the congenitally blind might have different shape representations from the sighted.
To assess how visual, tactile, and language inputs affect shape representations, we compared congenitally blind and sighted participants in explicit object shape production experiments. We chose three domains of objects with which both groups were highly familiar but had different degrees of tactile experiences: tools (high), large nonmanipulable objects (medium), and animals (low). Three shape knowledge production behavioral experiments were conducted on these objects: object feature verbal generation (a language task), clay modelling with Play-Doh (3D shape representation), and drawing (2D shape representation). The clay modelling experiment (Exp 2) is the most transparent one and would be considered as the main experiment, with the verbal feature task replicating and extending the literature and drawing task exploring the 2D transformation of the object shape in the two populations. We tested potential group differences in two aspects: the general quality of their shape knowledge production (How good is each response?) and the inter-subject consistency (How variable is each response with that of the other participants?). Specific measures for these two aspects are chosen based on the feasibility and optimality for each experiment.
Comments (0)