Nov 25 ====== Daland/Pierrehumbert:2011 ------------------------- How Diphone-Based Segmentationmodel is constructed and what are its advantages? Has this work been expanded and incorporated into speech to text systems? How robust is the model in regard to dialects? Could we train a model on language A and test it on language B? Could we make assumptions about how close languages A and B are depending on how well a cross-trained model works (especially the unsupervised one)? The paper extensively discusses word segmentation - it states that word segmentation can facilitate the learning process, but must be separated from the learning process. In a real-life situation children have a comprehensive language environment, provided by the adults speaking around them, but they also receive word segmentation learning as parents usually give syllable - split words to children in order to teach them how to speak - for example, they usually repeat words like "mama" in the way "ma - ma" and so on. This conflicts with the opinion that the learning process is separated from word segmentation, doesn't it? Words not listed in the dictionary were omitted. Usually a big chunk of these words corresponds to the children’s own output repeated back to them (e.g. "you want baba?" seems like an adult response to a child's utterance of "baba"). Wouldn't that make these words (or even babbles in an earlier stage) pivotal in recognizing boundaries?