Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes
General



Research



Non-Research


In the recent past, I have been working on both syntactic and semantic text processing, in particular, textual inference, multilingual parsing, and machine translation (ongoing).

Textual Inference

Roughly speaking, the ultimate gold of textual inference is to enable the computer to do inferences based on natural language texts, whether one text can be inferred by the other or they have the same/similar meaning. On the one hand, it touches the key issue of connecting meaning representation with various linguistic expressions; on the other hand, it also meets real applications, such as question answering, information extraction, machine translation, etc.

  The previous work of my master thesis used a subsequence-kernel-based machine learning method to obtain the similarity between dependency paths (Wang and Neumann, 2007a). Due to the relatively high accuracy and low coverage of the method, I developed more specialized modules to deal with other cases of entailment, which could not be covered by the previous approach. For instance, my colleagues in DFKI and I investigated entailment cases with temporal expressions (Wang and Zhang, 2008), and also with other named-entities (NEs) such as location names using a geographical ontology (Wang and Neumann, 2008c). Later, I also collaborated with my colleague in my department to work on applying inference rules to this task (Dinu and Wang, 2009). In addition to this "specialized" view of this problem, I also explored the possibilities of connecting this task to other tasks. I proposed a looser measurement, relatedness, to partially filter out the non-entailment cases (Wang and Zhang, 2009), and further extended it with two other dimensions, inconsistency and inequality. Finally, we built a multi-dimensional classification model to achieve a unified approach to recognize multiple textual semantic relations, paraphrase, entailment, contradiction, and others (Wang and Zhang, 2011).

  I participated in the recent Recognizing Textual Entailment challenges with my colleagues. In RTE-4, we ranked the 3rd place among all the 26 groups from both research institutes and industry companies (Wang and Neumann, 2009); and in RTE-5, we achieved the 2nd place among 20 teams (Wang et al., 2009c). Furthermore, we also explored the possibility to utilize this technology for other applications by participating in Answer Validation Exercise (AVE) and achieved the best results for both English (Wang and Neumann, 2007b) and German (Wang and Neumann, 2008b). We also participated in SemEval-2010 shared task #12, parser evaluation using textual entailment (PETE), and ranked the 3rd place with our RTE system (Wang and Zhang, 2010b).

  In addition to the efforts on solving the problem, I have been also involved in building resources for textual inference. My colleagues and I constructed two corpora: one has a new annotation scheme of six categories of textual semantic relations with manual annotations (Wang and Sporleder, 2010); and the other uses the crowd-sourcing technique to collect the data from the Web (Wang and Callison-Burch, 2010). These two corpora respectively provide an alternative annotation scheme and data collection method to the existing datasets.

Multilingual Parsing

In fact, in order to get a proper meaning of the text, I also work on syntactic parsing and semantic role labeling, which are both fundamental tasks for natural language processing (NLP). My colleagues and I actively participated in the CoNLL 2008 shared task, and achieved the 2rd place in the syntactic parsing, and the 7th place in the semantic role labeling among all the 24 submissions (Zhang et al., 2008). In addition, we also obtained 1st place for the open challenge, which any external resources could be used. We were the only team to show improvement after using a handcrafted HPSG grammar.

  Furthermore, we showed that a standard semantic role labeler can achieve better performance by stacking them with features from handcrafted HPSG grammars (together with its semantic representation in the form of Minimal Recursion Semantics). The enhanced performance can be observed on English, German, Japanese, and Spanish. In particular, more significant improvements can be observed on the out-domain datasets for both English and German. Later, we found that the state-of-the-art statistical parsers can also benefit from these handcrafted grammars (Zhang and Wang, 2009), which confirms our assumption that more general linguistic insights are captured by these grammars.

  In addition to the languages mentioned above, I also did research on (Mandarin) Chinese parsing, including combining constituent and dependency parsing on Tsinghua Chinese Treeback (Wang et al., 2009b; Wang and Zhang, 2010a) and discriminative parse reranking with homogeneous and heterogeneous annotations (Sun et al., 2010).

Machine Translation (ongoing)

I am currently working for the EuroMatrixPlus project, which aims to advance the state-of-the-art machine translation (MT) research and bring it to the users. My research focus is to explore the relationship between MT and semantic tasks/resources (in the general sense). This includes two aspects: 1) to discover whether MT can benefit from existing semantic resources, and 2) to extend the existing semantic tasks to the cross-lingual area by applying MT systems. Bringing two exciting research fields together will lead to an even more attractive area to explore. The results of this research are interesting for both the MT community and the semantics community.

Others

Apart from these main research interests, I have also been involved in many other NLP tasks, such as closed-domain question answering (Wang and Yao, 2004), Chinese question classification (Wang, 2005), question answering on speech transcriptions (Neumann and Wang, 2007), NE recognition (Wang et al., 2005), and also opinion mining
(Yao et al., 2008).

  Concerning cross-field collaborations, I worked with my friends in computational biology department to extract protein mutation from Biological literature (Wang et al., 2009), and also with my friends in Italy on sketch recognition (Avola et al., 2009).

  Please find a complete list of my publications here.