Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing - Archive ouverte HAL Access content directly
Book Sections Corpus Linguistics Around the World.
Ed. Andrew Wilson, Paul Rayson, and Dawn Archer
Year : 2006

Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing

(1) , (1) , (1) , (1) , (1) , (1) , (1) , (1) , (1) , (1)
1
Itziar I. Aduriz
  • Function : Author
Maxux M. Aranzabe
  • Function : Author
Jose Maria J. Arriola
  • Function : Author
Atziber A. Atutxa
  • Function : Author
Nerea N. Ezeiza
  • Function : Author
Koldo K. Gojenola
  • Function : Author
Maite M. Oronoz
  • Function : Author
Aitor A. Soroa
  • Function : Author
Ruben R. Urizar
  • Function : Author

Abstract

This article describes the different steps in the construction of EPEC (Reference Corpus for the Processing of Basque). EPEC is a corpus of standard written Basque that has been manually tagged at different levels (morphology, surface syntax, phrases) and is currently being hand tagged at deep syntax level following the Dependency Structure-based Scheme. It is aimed to be a "reference" corpus for the development and improvement of several NLP tools for Basque. This corpus has already been used for the construction of some tools such as a morphological analyser, a lemmatiser, or a shallow syntactic analyser.
Fichier principal
Vignette du fichier
CLAW2006.pdf (310.78 Ko) Télécharger le fichier
Loading...

Dates and versions

artxibo-00080508 , version 1 (19-06-2006)
artxibo-00080508 , version 2 (22-06-2006)

Identifiers

  • HAL Id : artxibo-00080508 , version 2

Cite

Itziar I. Aduriz, Maxux M. Aranzabe, Jose Maria J. Arriola, Atziber A. Atutxa, Arantza Díaz de Ilarraza, et al.. Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing. 56, Rodopi. Book series: Language and Computers., pp.1-15, 2006. ⟨artxibo-00080508v2⟩
422 View
748 Download

Share

Gmail Facebook Twitter LinkedIn More