Deep-sequoia: A French corpus with surface and deep syntactic annotations

Deep-sequoia is a corpus of French sentences annotated with both surface and deep syntactic dependency structures. It is freely available with the LGPL-LR License. The latest released is the version 8.1 (2017.10.17).

For more details about the annotations, please consult the annotation guidelines (in French) online on in PDF

References

Deep syntactic annotations

Marie Candito, Guy Perrier, Bruno Guillaume, Corentin Ribeyre, Karën Fort, Djamé Seddah and Éric de la Clergerie. (2014) Deep Syntax Annotation of the Sequoia French Treebank. Proc. of LREC 2014, Reykjavic, Iceland.

Guy Perrier, Marie Candito, Bruno Guillaume, Corentin Ribeyre, Karën Fort and Djamé Seddah. (2014) Un schéma d’annotation en dépendances syntaxiques profondes pour le français. Proc. of TALN 2014, Marseille, France.

Initial version (constituency trees + surface dependencies)

Marie Candito and Djamé Seddah. (2012) Le corpus Sequoia : annotation syntaxique et exploitation pour l’adaptation d’analyseur par pont lexical, Proceedings of TALN’2012, Grenoble, France

Example

The annotation of the French sentence Europar.550_00040Je pense vraiment qu’il convient d’être extrêmement prudent.” (I really think that we have to be extremely careful) in the 3 available formats:

deep_and_surf

Europar.550_00040.deep_and_surf

surf

Europar.550_00040.surf

deep

Europar.550_00040.deep