THE 2024 JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE
RESOURCES AND EVALUATION

20-25 MAY, 2024 / TORINO, ITALIA

Tutorials

/

/

Tutorials

Tutorials Schedule

Monday, May 20, 2024

Meaning Representations for Natural Languages: Design, Models and Applications

Full day

Instructors: Julia Bonn, Jeffrey Flanigan, Jan Hajič, Ishan Jindal, Yunyao Li and Nianwen Xue

Type: Cutting-Edge

Links: Email – BiosTutorials

Abstract:This tutorial introduces a research area that has the potential to create linguistic resources and build computational models that provide critical components for interpretable and controllable NLP systems. While large language models have shown remarkable ability to generate fluent text, the blackbox nature of these models makes it difficult to know where to tweak these models to fix errors, at least for now. For instance, LLMs are known to hallucinate and there is no mechanism in these models to only provide factually correct answers. Addressing this issue requires that first of all the models have access to a body of verifiable facts, and then use it effectively. Interpretability and controllability in NLP systems are critical in high-stake applications such as the medical domain. There has been a steady accumulation of semantically annotated, increasingly richer resources, which can now be derived with high accuracy from raw texts. Hybrid models can be used to extract verifiable facts at scale to build controllable and interpretable systems, for grounding in human-robot interaction (HRI) systems, support logical reasoning, or used in extremely low resource settings. This tutorial will provide an overview of these semantic representations, the computational models that are trained on them, as well as the practical applications built with these representations, including future directions.

Instructors: Leshem Choshen, Ariel Gera, Yotam Perlitz, Michal Shmueli-Scheuer and Gabriel Stanovsky

Type: Cutting-Edge

Links: Website – Email

Abstract: General-Purpose Language Models have changed the world of Natural Language Processing, if not the world itself. The evaluation of such versatile models, while supposedly similar to evaluation of generation models before them, in fact presents a host of new evaluation challenges and opportunities. In this Tutorial, we will start from the building blocks of evaluation. The tutorial welcomes people from diverse backgrounds and assumes little familiarity with metrics, datasets, prompts and benchmarks. It will lay the foundations and explain the basics and their importance, while touching on the major points and breakthroughs of the recent era of evaluation. It will also compare traditional evaluation methods — which are still widely used — to newly developed methods. We will contrast new to old approaches, from evaluating on many-task benchmarks rather than on dedicated datasets to efficiency constraints, and from testing stability and prompts on in-context learning to using the models themselves as evaluation metrics.

The DBpedia Databus Tutorial: Increase the Visibility and Usability of Your Data

Half day – Afternon

Instructors: Milan Dojchinovski

Type: Cutting-Edge

Links: WebsiteEmail

Abstract: This half-day tutorial introduces DBpedia Databus (https://databus.dbpedia.org), a FAIR data publishing platform, to address challenges of data producers and data consumers. The tutorial covers management, publishing, and consumption of data on the DBpedia Databus, with an exclusive focus on Linguistic Knowledge Graphs. The tutorial also offers practical insights for knowledge graph stakeholders, aiding data integration and accessibility in the Linked Open Data community. Designed for a diverse audience, it fosters hands-on learning to familiarize participants with the DBpedia Databus technology.

NLP for Chemistry — Introduction and Recent Advances

Half day – Afternoon

Instructors:  Camilo Thorne and Saber Akhondi

Type: Introductory to Adjacent Areas

Links: WebsiteEmail

Abstract: In this half-day tutorial we will be giving an introductory overview to a number of recent applications of natural language processing to a relatively underrepresented application domain: chemistry. Specifically, we will see how neural language models (transformers) can be applied (oftentimes with near-human performance) to chemical text mining, reaction extraction, or more importantly computational chemistry (forward and backward synthesis of chemical compounds). At the same time, a number of gold standards for experimentation have been made available to the research –academic and otherwise– community. Theoretical results will be, whenever possible, supported by system demonstrations in the form of Jupyter notebooks. This tutorial targets an audience interested in bioinformatics and biomedical applications, but pre-supposes no advanced knowledge of either.
https://github.com/camilothorne/nlp-4-chemistry-lrec-2024

Formal Semantic Controls over Language Models

Half day – Morning

Instructors: Danilo Silva de Carvalho, yingji zhang and André Freitas

Type: Cutting-Edge

Links: WebsiteEmail

Abstract: Text embeddings provide a concise representation of the semantics of sentences and larger spans of text, rather than individual words, capturing a wide range of linguistic features. They have found increasing application to a variety of NLP tasks, including machine translation and natural language inference. While most recent breakthroughs in task performance are being achieved by large scale distributional models, there is a growing disconnection between their knowledge representation and traditional semantics, which hinders efforts to capture such knowledge in human interpretable form or explain model inference behaviour.
In this tutorial, we examine from basics to the cutting edge research on the analysis and control of text representations, aiming to shorten the gap between deep latent semantics and formal symbolics. This includes the considerations on knowledge formalisation, the linguistic information that can be extracted and measured from distributional models, and intervention techniques that enable explainable reasoning and controllable text generation, covering methods from pooling to LLM-based.

Towards a Human-Computer Collaborative Scientific Paper Lifecycle: A Pilot Study and Hands-On Tutorial

Half day – Morning

Instructors: Qingyun Wang, Carl Edwards, Heng Ji and Tom Hope

Type: Cutting-Edge

Links: WebsiteEmail

Abstract: Due to the rapid growth of publications varying in quality, there exists a pressing need to help scientists digest and evaluate relevant papers, thereby facilitating scientific discovery. This creates a number of urgent questions; however, computer-human collaboration in the scientific paper lifecycle is still in the exploratory stage and lacks a unified framework for analyzing the relevant tasks. Additionally, with the recent significant success of large language models (LLMs), they have increasingly played an important role in academic writing. In this tutorial, we aim to provide an all-encompassing overview of the paper lifecycle, detailing how machines can augment every stage of the research process for the scientist, including scientific literature understanding, experiment development, manuscript draft writing, and finally draft evaluation. This tutorial is devised for researchers interested in this rapidly-developing field of NLP-augmented paper writing. The tutorial will also feature a session of hands-on exercises during which participants can guide machines in generating ideas and automatically composing key paper elements. Furthermore, we will address current challenges, explore future directions, and discuss potential ethical issues. A toolkit designed for human-computer collaboration throughout the paper lifecycle will also be made publically available. The tutorial materials are online at https://sites.google.com/view/coling2024-paper-lifecycle/.

Tuesday, May 21, 2024

From Multimodal LLM to Human-level AI: Modality, Instruction, Reasoning, Efficiency and Beyond

Half day – Afternoon

Instructors: Hao Fei, Yuan Yao, Zhuosheng Zhang, Fuxiao Liu, Ao Zhang and Tat-Seng Chua

Type: Cutting-Edge

Links: WebsiteEmail

Abstract: Artificial intelligence (AI) encompasses knowledge acquisition and real-world grounding across various modalities. As a multidisciplinary research field, multimodal large language models (MLLMs) have recently garnered growing interest in both academia and industry, showing an unprecedented trend to achieve human-level AI via MLLMs. These large models offer an effective vehicle for understanding, reasoning, and planning by integrating and modeling diverse information modalities, including language, visual, auditory, and sensory data. This tutorial aims to deliver a comprehensive review of cutting-edge research in MLLMs, focusing on four key areas: MLLM architecture design, instructional learning, multimodal reasoning, and the efficiency of MLLMs. We will explore technical advancements, synthesize key challenges, and discuss potential avenues for future research. All the resources and materials are available at https://mllm2024.github.io/COLING2024

Knowledge Editing for Large Language Models

Half day – Morning

Instructors: Ningyu Zhang, Yunzhi Yao and Shumin Deng

Type: Cutting-Edge

Links: WebsiteEmail

Abstract: Even with their impressive abilities, Large Language Models (LLMs) such as ChatGPT are not immune to issues of factual or logically consistent. Concretely, the key concern is how to seamlessly update those LLMs to correct mistakes without resorting to an exhaustive retraining or continuous training procedure, both of which can demand significant computational resources and time. Thus, the capability to edit LLMs offers an efficient solution to alter a model’s behavior, notably within a distinct area of interest, without negatively impacting its performance on other tasks. Through this tutorial, we strive to acquaint interested NLP researchers with recent and emerging techniques for editing LLMs. Specifically, we aim to present a systematic and current overview of cutting-edge methods, supplemented with practical tools, and unveil new research opportunities for our audiences. All the valuable resources can be accessed at https://github.com/zjunlp/KnowledgeEditingPapers .

Saturday, May 25, 2024

Geo-Cultural Representation and Inclusion in Language Technologies

Half day – Morning

Instructors:  Sunipa Dev and Rida Qadri

Type: Cutting-Edge

Abstract:Training and evaluation of language models are increasingly relying on semi-structured data that is annotated by humans, along with techniques such as RLHF growing in usage across the board. As a result, both the data and the human perspectives involved in this process play a key role in what is taken as ground truth by our models. As annotation tasks are becoming increasingly more subjective and culturally complex, it is unclear how much of their socio-cultural identity annotators use to respond to tasks. We also currently do not have ways to integrate rich and diverse community perspectives into our language technologies. Accounting for such cross-cultural differences in interacting with technology is an increasingly crucial step for evaluating AI harms holistically. Without this, the state of the art of the AI models being deployed is at risk of causing unprecedented biases at a global scale. In this tutorial, we will take an interactive approach by utilizing some different types of annotation tasks to investigate together how our different socio-cultural perspectives and lived experiences influence what we consider as appropriate representations of global concepts.

Mining, Assessing, and Improving Arguments in NLP and the Social Sciences

Full day

Instructors: Gabriella Lapesa, Eva Maria Vecchi, Serena Villata and Henning Wachsmuth

Type: Introductory to Computational Linguistics/NLP Topics

Links: Website

Abstract: Computational argumentation is an interdisciplinary research field, connecting Natural Language Processing (NLP) to other disciplines such as the social sciences. The focus of recent research has concentrated on argument quality assessment: what makes an argument good or bad? This tutorial will have a strong interdisciplinary and interactive nature, and will be structured along three main coordinates: (1) the notions of argument quality (AQ) across disciplines (how do we recognize good and bad arguments?), with a particular focus on the interface between Argument Mining (AM) and Deliberation Theory; (2) the modeling of subjectivity (who argues to whom; what are their beliefs?); and (3) the generation of improved arguments (what makes an argument better?). The tutorial is based on a previous version presented by the same authors at EACL 2023 (https://sites.google.com/view/argmintutorialeacl2023/home-page), but it will also touch upon a series of topics that are particularly relevant for the LREC-COLING audience (the issue of data quality for the assessment of AQ; the interdisciplinary application of AM and AQ in a text-as-data approach to Political Science), in line with the developments in NLP (LLMs for AQ assessment), and relevant for the societal applications of AQ assessment (bias and debiasing). We will involve the participants in two annotation studies on the assessment and the improvement of quality.

Hallucination in Large Language Models

Half day – Morning

Instructors: Vipula Rawte, Aman Chadha, Amit Sheth and Amitava Das

Type: Cutting-Edge

Links: EmailBios

Abstract: In the fast-paced domain of Large Language Models (LLMs), the issue of hallucination is a prominent challenge. Despite continuous endeavors to address this concern, it remains a highly active area of research within the LLM landscape. Grasping the intricacies of this problem can be daunting, especially for those new to the field. This tutorial aims to bridge this knowledge gap by introducing the emerging realm of hallucination in LLMs. It will comprehensively explore the key aspects of hallucination, including benchmarking, detection, and mitigation techniques. Furthermore, we will delve into the specific constraints and shortcomings of current approaches, providing valuable insights to guide future research efforts for participants.

Addressing Bias and Hallucination in Large Language Models

Half day – Afternoon

Instructors: Nihar Sahoo, Ashita Saxena, Kishan Maharaj, Arif Ahmad, Abhijit Mishra and Pushpak Bhattacharyya

Type: Cutting-Edge

Abstract: In the landscape of natural language processing (NLP), addressing the challenges of bias and hallucination is paramount to ensuring the ethical and unbiased development of Large Language Models (LLMs). This tutorial delves into the intricate dimensions of LLMs, shedding light on the critical importance of understanding and mitigating the profound impacts of bias and hallucination. The tutorial begins with discussions on the complexity of bias propagation in LLM development, where we dissect its origins and far-reaching impacts along with the automatic evaluation metrics for bias measurement. We then present innovative methodologies for mitigating diverse forms of bias, including both static and contextualized word embeddings and robust benchmarking strategies. In addition, the tutorial explores the interlinkage between hallucination and bias in LLMs by shedding light on how bias can be perceived as a hallucination problem. Furthermore, we also talk about cognitively-inspired deep learning frameworks for hallucination detection which leverages human gaze behavior. Ultimately, this cutting-edge tutorial serves as a guiding light, equipping participants with indispensable tools and insights to navigate the ethical complexities of LLMs, thus paving the way for the development of unbiased and ethically robust NLP systems.

Knowledge-enhanced Response Generation in Dialogue Systems: Current Advancements and Emerging Horizons

Half day – Morning

Instructors: Priyanshu Priya, Deeksha Varshney, Mauajama Firdaus and Asif Ekbal

Type: Introductory to Computational Linguistics/NLP Topics

Links: Website BiosEmail

Abstract: This tutorial provides an in-depth exploration of Knowledge-enhanced Dialogue Systems (KEDS), diving into their foundational aspects, methodologies, advantages, and practical applications. Topics include the distinction between internal and external knowledge integration, diverse methodologies employed in grounding dialogues, and innovative approaches to leveraging knowledge graphs for enhanced conversation quality. Furthermore, the tutorial touches upon the rise of biomedical text mining, the advent of domain-specific language models, and the challenges and strategies specific to medical dialogue generation. The primary objective is to give attendees a comprehensive understanding of KEDS. By delineating the nuances of these systems, the tutorial aims to elucidate their significance, highlight advancements made using deep learning, and pinpoint the current challenges. Special emphasis is placed on showcasing how KEDS can be fine-tuned for domain-specific requirements, with a spotlight on the healthcare sector. The tutorial is crafted for both beginners and intermediate researchers in the dialogue systems domain, with a focus on those keen on advancing research in KEDS. It will also be valuable for practitioners in sectors like healthcare, seeking to integrate advanced dialogue systems.