Deep-sequoia: A multilayer French corpus
Deep-sequoia is a corpus of French sentences annotated in several projects with a set of annotation layers. It is freely available with the LGPL-LR License. The latest version is the 9.2 released in October 2020 (see the README file)
Annotation layers:
Layer | Code | Project | Version | Guidelines |
---|---|---|---|---|
Surface syntax | S | Sequoia (2012) | See Sequoia | included in deep syntax annotation guidelines (in French) |
Deep syntax | D | Deep-sequoia (2014) | Since 1.0 | deep syntax annotation guidelines (in French) |
MWEs and Named Entities | P | PARSEME-FR (2019) | Since 9.0 | PARSEME-FR guide |
Supersense annotation | F | FRSEMCOR (2020) | Since 9.1 | FRSEMCOR guide (in French) |
References
Initial version (constituency trees + surface dependencies)
Marie Candito and Djamé Seddah. (2012) Le corpus Sequoia : annotation syntaxique et exploitation pour l’adaptation d’analyseur par pont lexical, Proceedings of TALN’2012, Grenoble, France.
Marie Candito and Djamé Seddah. (2012) Effectively long-distance dependencies in French: annotation and parsing evaluation, Proceedings of TLT’11, 2012, Lisbon, Portugal.
Deep-sequoia: deep syntactic annotations
Marie Candito, Guy Perrier, Bruno Guillaume, Corentin Ribeyre, Karën Fort, Djamé Seddah and Éric de la Clergerie. (2014) Deep Syntax Annotation of the Sequoia French Treebank. Proc. of LREC 2014, Reykjavic, Iceland.
Guy Perrier, Marie Candito, Bruno Guillaume, Corentin Ribeyre, Karën Fort and Djamé Seddah. (2014) Un schéma d’annotation en dépendances syntaxiques profondes pour le français. Proc. of TALN 2014, Marseille, France.
PARSEME-FR: MWE and named entities annotation
Marie Candito, Mathieu Constant, Carlos Ramisch, Agata Savary, Yannick Parmentier, Caroline Pasquer and Jean-Yves Antoine. (2017) Annotation d’expressions polylexicales verbales en français, Proc. of TALN 2017 - short papers, Orléans
In preparation: A French corpus annotated for multi-word expressions and named entities.
FRSEMCOR: Supersens annotation
- Lucie Barque, Pauline Haas, Richard Huyghe, Delphine Tribout, Marie Candito, Benoît Crabbé and Vincent Segonne. (2020) Annotating a French Corpus with Supersenses, Proc. of LREC 2020, Marseille, France.