A hybrid approach to 3D tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation

Alexander Hewer, Ingmar Steiner and Stefanie Wuhrer


This video contains supplementary material of our paper that we submitted to Interspeech 2014. A detailed description can be found below.


Important note: The video has to be watched in high-resolution to see every detail.

Static 3D MRI

First, we show for three scans of the Baker dataset the following scenes:

  1. Volume rendering of the 3D scan
  2. Volume rendering of the segmentation
  3. Extracted point cloud
  4. Template placed in the point cloud
  5. Evolution of the template mesh during the template matching optimization
  6. 2D point cloud obtained by only using the mid-sagittal slice
  7. Template placed in the 2D point cloud
  8. Evolution of the template mesh if 2D point cloud is used
  9. Comparison between the mesh obtained from full 3D data and the one obtained from mid-sagittal information

The rough starting time of each scene is given in the following table:

Scene 1 Scene 2 Scene 3 Scene 4 Scene 5 Scene 6 Scene 7 Scene 8 Scene 9
Scan [ɑ] 00:04 00:07 00:10 00:13 00:15 00:21 00:24 00:26 00:33
Scan [ɪ] 00:35 00:39 00:42 00:45 00:46 00:53 00:56 00:58 01:04
Scan [l] 01:07 01:10 01:13 01:16 01:18 01:25 01:27 01:29 01:36

In the evolution process, we show the different levels of the coarse-to-fine strategy that starts with a low resolution mesh that is refined on each following level until the original resolution is reached. On each level, we visualize the state of the template after each minimized energy in the series.

The difference between the two meshes is shown like in our paper by decorating the mesh we obtained from the 2D data with a heat map. The respective values for the minimum and maximum distance that were used for deriving the scale are given below:

Minimum Maximum
Scan [ɑ] 0.076114 9.277325
Scan [ɪ] 0.275535 12.69881
Scan [l] 0.68881 10.79903

Real-time 2D MRI

Starting at 01:39, we show an animation of the meshes we obtained from the frames 76 to 434 of the Niebergall dataset.