THE 2024 JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE
RESOURCES AND EVALUATION

20-25 MAY, 2024 / TORINO, ITALIA

Tutorials

/

/

Tutorials

Tutorials Schedule

Monday, May 20, 2024

Meaning Representations for Natural Languages: Design, Models and Applications

Full day

Instructors: Julia Bonn, Jeffrey Flanigan, Jan Hajič, Ishan Jindal, Yunyao Li and Nianwen Xue

Type: Cutting-Edge

Links: Email – BiosTutorials

Abstract:This tutorial introduces a research area that has the potential to create linguistic resources and build computational models that provide critical components for interpretable and controllable NLP systems. While large language models have shown remarkable ability to generate fluent text, the blackbox nature of these models makes it difficult to know where to tweak these models to fix errors, at least for now. For instance, LLMs are known to hallucinate and there is no mechanism in these models to only provide factually correct answers. Addressing this issue requires that first of all the models have access to a body of verifiable facts, and then use it effectively. Interpretability and controllability in NLP systems are critical in high-stake applications such as the medical domain. There has been a steady accumulation of semantically annotated, increasingly richer resources, which can now be derived with high accuracy from raw texts. Hybrid models can be used to extract verifiable facts at scale to build controllable and interpretable systems, for grounding in human-robot interaction (HRI) systems, support logical reasoning, or used in extremely low resource settings. This tutorial will provide an overview of these semantic representations, the computational models that are trained on them, as well as the practical applications built with these representations, including future directions.

Instructors: Leshem Choshen, Ariel Gera, Yotam Perlitz, Michal Shmueli-Scheuer and Gabriel Stanovsky

Type: Cutting-Edge

Links: Website – Email

Abstract: General-Purpose Language Models have changed the world of Natural Language Processing, if not the world itself. The evaluation of such versatile models, while supposedly similar to evaluation of generation models before them, in fact presents a host of new evaluation challenges and opportunities. In this Tutorial, we will start from the building blocks of evaluation. The tutorial welcomes people from diverse backgrounds and assumes little familiarity with metrics, datasets, prompts and benchmarks. It will lay the foundations and explain the basics and their importance, while touching on the major points and breakthroughs of the recent era of evaluation. It will also compare traditional evaluation methods — which are still widely used — to newly developed methods. We will contrast new to old approaches, from evaluating on many-task benchmarks rather than on dedicated datasets to efficiency constraints, and from testing stability and prompts on in-context learning to using the models themselves as evaluation metrics.

The DBpedia Databus Tutorial: Increase the Visibility and Usability of Your Data

Half day – Afternon

Instructors: Milan Dojchinovski

Type: Cutting-Edge

Links: WebsiteEmail

Abstract: This half-day tutorial introduces DBpedia Databus (https://databus.dbpedia.org), a FAIR data publishing platform, to address challenges of data producers and data consumers. The tutorial covers management, publishing, and consumption of data on the DBpedia Databus, with an exclusive focus on Linguistic Knowledge Graphs. The tutorial also offers practical insights for knowledge graph stakeholders, aiding data integration and accessibility in the Linked Open Data community. Designed for a diverse audience, it fosters hands-on learning to familiarize participants with the DBpedia Databus technology.

NLP for Chemistry — Introduction and Recent Advances

Half day – Afternoon

Instructors:  Camilo Thorne and Saber Akhondi

Type: Introductory to Adjacent Areas

Links: WebsiteEmail

Abstract: In this half-day tutorial we will be giving an introductory overview to a number of recent applications of natural language processing to a relatively underrepresented application domain: chemistry. Specifically, we will see how neural language models (transformers) can be applied (oftentimes with near-human performance) to chemical text mining, reaction extraction, or more importantly computational chemistry (forward and backward synthesis of chemical compounds). At the same time, a number of gold standards for experimentation have been made available to the research –academic and otherwise– community. Theoretical results will be, whenever possible, supported by system demonstrations in the form of Jupyter notebooks. This tutorial targets an audience interested in bioinformatics and biomedical applications, but pre-supposes no advanced knowledge of either.
https://github.com/camilothorne/nlp-4-chemistry-lrec-2024

Formal Semantic Controls over Language Models

Half day – Morning

Instructors: Danilo Silva de Carvalho, yingji zhang and André Freitas

Type: Cutting-Edge

Links: WebsiteEmail

Abstract: Text embeddings provide a concise representation of the semantics of sentences and larger spans of text, rather than individual words, capturing a wide range of linguistic features. They have found increasing application to a variety of NLP tasks, including machine translation and natural language inference. While most recent breakthroughs in task performance are being achieved by large scale distributional models, there is a growing disconnection between their knowledge representation and traditional semantics, which hinders efforts to capture such knowledge in human interpretable form or explain model inference behaviour.
In this tutorial, we examine from basics to the cutting edge research on the analysis and control of text representations, aiming to shorten the gap between deep latent semantics and formal symbolics. This includes the considerations on knowledge formalisation, the linguistic information that can be extracted and measured from distributional models, and intervention techniques that enable explainable reasoning and controllable text generation, covering methods from pooling to LLM-based.

Towards a Human-Computer Collaborative Scientific Paper Lifecycle: A Pilot Study and Hands-On Tutorial

Half day – Morning

Instructors: Qingyun Wang, Carl Edwards, Heng Ji and Tom Hope

Type: Cutting-Edge

Links: WebsiteEmail

Abstract: Due to the rapid growth of publications varying in quality, there exists a pressing need to help scientists digest and evaluate relevant papers, thereby facilitating scientific discovery. This creates a number of urgent questions; however, computer-human collaboration in the scientific paper lifecycle is still in the exploratory stage and lacks a unified framework for analyzing the relevant tasks. Additionally, with the recent significant success of large language models (LLMs), they have increasingly played an important role in academic writing. In this tutorial, we aim to provide an all-encompassing overview of the paper lifecycle, detailing how machines can augment every stage of the research process for the scientist, including scientific literature understanding, experiment development, manuscript draft writing, and finally draft evaluation. This tutorial is devised for researchers interested in this rapidly-developing field of NLP-augmented paper writing. The tutorial will also feature a session of hands-on exercises during which participants can guide machines in generating ideas and automatically composing key paper elements. Furthermore, we will address current challenges, explore future directions, and discuss potential ethical issues. A toolkit designed for human-computer collaboration throughout the paper lifecycle will also be made publically available. The tutorial materials are online at https://sites.google.com/view/coling2024-paper-lifecycle/.