Jan 17 ====== Viswanathan/etal:2010 --------------------- How might the specific stimulus design (e.g., resynthesized CV syllables) influence the observed effects, and could the results differ with more naturalistic speech stimuli? Resynthesized stimuli may not accurately capture the full range of acoustic properties present in natural speech, which could potentially lead to differences in how listeners process and interpret the stimuli. If I understand correctly, the Direct Realist Theory (Gestural) is applicable when the acoustic signal is Speech, but the evidence for General Auditory theory can be found both in speech and non-speech audible signals. If this is the case, is it possible that both these mechanisms work together during perception of speech, but only one of them is observed depending on if the auditory signal is speech or non-speech? For example, in the case of non-speech auditory signal, perhaps there is simply an absence of perception of phonetic gestures? "Fowler and Dekle (1991) tested the generality of this explanation by using a combination of haptic and auditory information" -> I'm curious as to what sorts of consonants/vowels can actually be perceived tactically. I find it hard to imagine distinguishing any sounds other than labials/labiodentals this way. I did look up this study and find it really interesting that they concluded that subjects who gained more from adding tactile information to auditory information tend to gain less from adding it to visual information and vice-verse. I would have expected individuals to be generally good or bad at incorporating tactile information across conditions. Figure 1 -> Should the labels for the [ga] points be swapped? I know this is the whole point of the paper, but it's really odd to think that we are reliably able to determine place of articulation from the acoustic signal yet we don’t have a consistent answer as to how we do so. To produce different syllables at the [da]-[ga] continuum, the authors first recorded a native speaker and then synthesized different syllables based on the characteristics of his voice. Can a human speaker intentionally produce a voice at a given point in a spectrum between two known sounds with reasonable accuracy? For example, is it possible to train someone to say a sound that is ~40% [da] and ~60% [ga]?