Methodology and Foundational Work

cortana

The methodology we employ consists of data mining from corpora, motivated as much by linguistic and cognitive principles as by empirical considerations; development of computational frameworks based on the corpus analysis; and rigorous evaluation of the ensuing software systems. Our work has provided significant contributions to the whole area, including issues of resource creation and validation. The most cited paper from the lab is (Di Eugenio & Glass, 2004), a critical look at the Kappa coefficient of inter annotator agreement. We still strongly believe in marrying linguistic insights and statistics, when we can show that linguistic knowledge results in better models. For example, we developed an innovative discourse parser that incorporates verb semantics and that performs better than models based only on lexical and syntactic information (Subba & Di Eugenio, 2009). This work also resulted in a publicly available corpus of texts annotated with discourse structure (please send inquiries in this regard to bdieugen@uic.edu). Similarly, we explored a variety of parameter settings in different corpora for centering, a theory of reference (Poesio et al., 2004). In work on recognizing dialogue acts, we showed that information about the hierarchical structure of the dialogue (dialogue games) improves empirical models (Di Eugenio et al., 2010b). On the basis of some older work on discourse cues (Di Eugenio et al., 1997), we have recently explored the societal important application of translating from Italian into Italian Sign Language (Lugaresi & Di Eugenio, 2013).