Dec 13
======

Prom-on/etal:2009
-----------------

"Also, as seen in Fig. 3, while the system performance is improved from
second to third order in terms of both RMSE and correlation, there is
little improvement from third to fourth order."
-> It doesn't seem like it actually improved much from second to third
order. I'm also a bit confused by where the numbers 3 and 0.8 are
coming from for the ceiling function justification.

Table V -> It's curious to note that rising and falling tones have
similar values for b, meaning that they move towards a similar pitch
target. It's generally said (at least in Chinese classes) that rising
tones go from low to high and falling tones from high to low, but it
would seem a more appropriate characterization is that they start high
or low, respectively, and move towards the middle.

"Their judgment of the naturalness of speech is nearly identical
between natural and synthetic F0."
-> This seems like a bit of a lofty conclusion given the Mandarin data
in Fig. 11.

The authors evaluate their model using Mandarin, a prototypical tonal
language, and English, a prototypical intonational language. Would the
model perform similarly well on languages with more complex intonation
systems or mixed tone and intonation systems?

Are target approximation models still in use in any context?

The authors claim that "the slopes of the dynamic targets are
essential to the dynamic tones like R and F because their F0
variability at different speech rates cannot be adequately simulated
by sequences of static targets such as low+high for rise or high+low
for fall."  Can you explain why sequences of static targets are not
sufficient for simulating F0 variability at different speech rates and
how that relates to the slopes of the dynamic targets?