Strong hallucinations from negation and how to fix them

Nicholas Asher; Swarnadeep Bhar

Communication Dans Un Congrès Année : 2024

Strong hallucinations from negation and how to fix them

(1) ,

Nicholas Asher

Fonction : Auteur
PersonId : 180278
IdHAL : nicholas-asher
ORCID : 0000-0002-7689-8246
IdRef : 056979193

MEthodes et ingénierie des Langues, des Ontologies et du DIscours

Swarnadeep Bhar

Fonction : Auteur
PersonId : 1487638

Résumé

Despite great performance on many tasks, language models (LMs) still struggle with reasoning, sometimes providing responses that cannot possibly be true because they stem from logical incoherence. We call such responses strong hallucinations and prove that they follow from an LM's computation of its internal representations for logical operators and outputs from those representations. Focusing on negation, we provide a novel solution in which negation is treated not as another element of a latent representation, but as an operation over an LM's latent representations that constrains how they may evolve. We show that our approach improves model performance in cloze prompting and natural language inference tasks with negation without requiring training on sparse negative data.

Domaines

Informatique [cs] Sciences de l'Homme et Société

Fichier sous embargo

0	―	2	―	30
Année		Mois		Jours

Avant la publication
jeudi 10 avril 2025

Nicholas Asher : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04878406

Soumis le : vendredi 10 janvier 2025-07:01:10

Dernière modification le : samedi 11 janvier 2025-03:33:17

Dates et versions

hal-04878406 , version 1 (10-01-2025)

Licence

Paternité

Identifiants

HAL Id : hal-04878406 , version 1

Citer

Nicholas Asher, Swarnadeep Bhar. Strong hallucinations from negation and how to fix them. Association for Computational Linguistics (ACL), Association for Computational Linguistics, Aug 2024, Bangkok / Thailand, Thailand. pp.12670-12687. ⟨hal-04878406⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS UT1-CAPITOLE IRIT IRIT-MELODI ANR IRIT-IA TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

0 Consultations

0 Téléchargements

Strong hallucinations from negation and how to fix them

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager