Modular Multimodal Machine Learning for Extraction of Theorems and Proofs in Long Scientific Documents - PaRis AI Research InstitutE
Communication Dans Un Congrès Année : 2024

Modular Multimodal Machine Learning for Extraction of Theorems and Proofs in Long Scientific Documents

Shrey Mishra
  • Fonction : Auteur
  • PersonId : 1105903
Antoine Gauquier

Résumé

We address the extraction of mathematical statements and their proofs from scholarly PDF articles as a multimodal classification problem, utilizing text, font features, and bitmap image renderings of PDFs as distinct modalities. We propose a modular sequential multimodal machine learning approach specifically designed for extracting theorem-like environments and proofs. This is based on a cross-modal attention mechanism to generate multimodal paragraph embeddings, which are then fed into our novel multimodal sliding window transformer architecture to capture sequential information across paragraphs. Our approach demonstrates performance improvements obtained by transitioning from unimodality to multimodality, and finally by incorporating sequential modeling over paragraphs.
Fichier principal
Vignette du fichier
main.pdf (643.93 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04805597 , version 1 (26-11-2024)

Licence

Identifiants

Citer

Shrey Mishra, Antoine Gauquier, Pierre Senellart. Modular Multimodal Machine Learning for Extraction of Theorems and Proofs in Long Scientific Documents. JCDL, Dec 2024, Hong Kong, China. ⟨10.1145/3677389.3702540⟩. ⟨hal-04805597⟩
0 Consultations
0 Téléchargements

Altmetric

Partager

More