Modular Multimodal Machine Learning for Extraction of Theorems and Proofs in Long Scientific Documents

Shrey Mishra; Antoine Gauquier; Pierre Senellart

doi:10.1145/3677389.3702540

Communication Dans Un Congrès Année : 2024

Modular Multimodal Machine Learning for Extraction of Theorems and Proofs in Long Scientific Documents

(1) , (1) , (1, 2)

1
2

Shrey Mishra

Fonction : Auteur
PersonId : 1105903

Value from Data

Antoine Gauquier

Fonction : Auteur
PersonId : 1288282
IdHAL : antoine-gauquier
ORCID : 0009-0005-9573-6364

Value from Data

Pierre Senellart

Fonction : Auteur
PersonId : 11778
IdHAL : pierre-senellart
ORCID : 0000-0002-7909-5369
IdRef : 124713769

Value from Data

Institut universitaire de France

Résumé

We address the extraction of mathematical statements and their proofs from scholarly PDF articles as a multimodal classification problem, utilizing text, font features, and bitmap image renderings of PDFs as distinct modalities. We propose a modular sequential multimodal machine learning approach specifically designed for extracting theorem-like environments and proofs. This is based on a cross-modal attention mechanism to generate multimodal paragraph embeddings, which are then fed into our novel multimodal sliding window transformer architecture to capture sequential information across paragraphs. Our approach demonstrates performance improvements obtained by transitioning from unimodality to multimodality, and finally by incorporating sequential modeling over paragraphs.

Mots clés

scholarly articles information extraction multimodal classifiers

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

main.pdf (643.93 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Pierre Senellart : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-04805597

Soumis le : mardi 26 novembre 2024-16:25:19

Dernière modification le : vendredi 29 novembre 2024-03:24:28

Dates et versions

hal-04805597 , version 1 (26-11-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04805597 , version 1
DOI : 10.1145/3677389.3702540

Citer

Shrey Mishra, Antoine Gauquier, Pierre Senellart. Modular Multimodal Machine Learning for Extraction of Theorems and Proofs in Long Scientific Documents. JCDL, Dec 2024, Hong Kong, China. ⟨10.1145/3677389.3702540⟩. ⟨hal-04805597⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA INRIA2 GENCI PSL ANR PRAIRIE-IA

0 Consultations

0 Téléchargements

Modular Multimodal Machine Learning for Extraction of Theorems and Proofs in Long Scientific Documents

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager