Dec 13 ====== Prom-on/etal:2009 ----------------- "Also, as seen in Fig. 3, while the system performance is improved from second to third order in terms of both RMSE and correlation, there is little improvement from third to fourth order." -> It doesn't seem like it actually improved much from second to third order. I'm also a bit confused by where the numbers 3 and 0.8 are coming from for the ceiling function justification. Table V -> It's curious to note that rising and falling tones have similar values for b, meaning that they move towards a similar pitch target. It's generally said (at least in Chinese classes) that rising tones go from low to high and falling tones from high to low, but it would seem a more appropriate characterization is that they start high or low, respectively, and move towards the middle. "Their judgment of the naturalness of speech is nearly identical between natural and synthetic F0." -> This seems like a bit of a lofty conclusion given the Mandarin data in Fig. 11. The authors evaluate their model using Mandarin, a prototypical tonal language, and English, a prototypical intonational language. Would the model perform similarly well on languages with more complex intonation systems or mixed tone and intonation systems? Are target approximation models still in use in any context? The authors claim that "the slopes of the dynamic targets are essential to the dynamic tones like R and F because their F0 variability at different speech rates cannot be adequately simulated by sequences of static targets such as low+high for rise or high+low for fall." Can you explain why sequences of static targets are not sufficient for simulating F0 variability at different speech rates and how that relates to the slopes of the dynamic targets?