Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing

Abstract : This article describes the different steps in the construction of EPEC (Reference Corpus for the Processing of Basque). EPEC is a corpus of standard written Basque that has been manually tagged at different levels (morphology, surface syntax, phrases) and is currently being hand tagged at deep syntax level following the Dependency Structure-based Scheme. It is aimed to be a "reference" corpus for the development and improvement of several NLP tools for Basque. This corpus has already been used for the construction of some tools such as a morphological analyser, a lemmatiser, or a shallow syntactic analyser.
Complete list of metadatas

Cited literature [20 references]  Display  Hide  Download

https://artxiker.ccsd.cnrs.fr/artxibo-00080508
Contributor : Izaskun Aldezabal <>
Submitted on : Thursday, June 22, 2006 - 1:19:14 PM
Last modification on : Thursday, February 21, 2019 - 10:52:48 AM
Long-term archiving on : Monday, September 20, 2010 - 4:04:56 PM

Identifiers

  • HAL Id : artxibo-00080508, version 2

Collections

Citation

Itziar Aduriz, Maxux Aranzabe, Jose Maria Arriola, Atziber Atutxa, Arantza Díaz de Ilarraza, et al.. Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing. 56, Rodopi. Book series: Language and Computers., pp.1-15, 2006. ⟨artxibo-00080508v2⟩

Share

Metrics

Record views

831

Files downloads

711