Séminaire en direction des partenaires de la chaire NoRDF :
Knowledge Graph Completion using Embeddings
Mehwish Alam, Maîtresse de conférences, Télécom Paris, Institut Polytechnique de Paris
Knowledge Graphs (KGs) have recently gained attention due to their applicability to diverse fields of research including various downstream tasks such as web search, recommender systems, and question answering. These tasks can also take advantage of Large Language Models (LLMs) which have recently revolutionalized the landscape of the field of Artificial Intelligence. Despite their remarkable performance in various NLP tasks, LLMs suffer from hallucination problems and are opaque models that lack interpretability. A potential solution to these problems is to induce the knowledge from KGs to LLMs. KGs are known for their reasoning capabilities and for generating interpretable results.
KGs and LLMs are thus complementary and can benefit from the capabilities of each other. KGs, however, suffer from incompleteness because of manual or automated generation. Recent years have witnessed many studies on link prediction using KG embeddings which is one of the mainstream tasks in KG completion. To do so, most of the existing methods learn the latent representation of the entities and relations whereas only a few of them consider contextual information as well as the textual descriptions of the entities. This talk will particularly focus on an attentive encoder-decoder based link prediction approach, MADLINK, leveraging the contextual information of the entities as well as their descriptions. A newly created set of benchmark datasets for KG completion will also be introduced which is extracted from Wikidata and Wikipedia, named LiterallyWikidata. It has been prepared with the main focus on providing benchmark datasets for multimodal KG Embedding models. This talk will also give an overview of the methods for entity type prediction which is a subtask of KG completion.
Statutory reasoning for tax law; or, Can GPT-4 do my taxes?
Nils Holzenberger, Maître de conférences, Télécom Paris, Institut Polytechnique de Paris
Legal professionals routinely need to determine which laws apply to a specific legal case. Statutory reasoning is the task of determining whether a given legal rule applies to a case, both being expressed in natural language. Statutory reasoning is a basic skill for lawyers, and computational statutory reasoning is a fundamental task for legal AI. The core challenge is developing models with the ability to utilize prescriptive rules stated in natural language, and able to generalize to new rules. The SARA dataset is a benchmark introduced in 2020 for statutory reasoning, in the context of US federal tax law. Since then, there have been a number of attempts at solving it. Each new general-purpose NLP model, from BERT to large language models, has had improved performance on the task. However, many gaps remain, and the SARA dataset has served to uncover major flaws in state-of-the-art models. I will conclude this talk by describing open problems in statutory reasoning.