Nov 25
======

Daland/Pierrehumbert:2011
-------------------------

How Diphone-Based Segmentationmodel is constructed and what are its
advantages?

Has this work been expanded and incorporated into speech to text
systems?

How robust is the model in regard to dialects? Could we train a model
on language A and test it on language B? Could we make assumptions
about how close languages A and B are depending on how well a
cross-trained model works (especially the unsupervised one)?

The paper extensively discusses word segmentation - it states that
word segmentation can facilitate the learning process, but must be
separated from the learning process. In a real-life situation children
have a comprehensive language environment, provided by the adults
speaking around them, but they also receive word segmentation learning
as parents usually give syllable - split words to children in order to
teach them how to speak - for example, they usually repeat words like
"mama" in the way "ma - ma" and so on. This conflicts with the opinion
that the learning process is separated from word segmentation, doesn't
it?

Words not listed in the dictionary were omitted. Usually a big chunk
of these words corresponds to the children’s own output repeated back
to them (e.g. "you want baba?" seems like an adult response to a
child's utterance of "baba"). Wouldn't that make these words (or even
babbles in an earlier stage) pivotal in recognizing boundaries?