Check out the new features of the release 4.0.

ParTree - Parallel Treebanks: A multilingual corpus of movie subtitles.

Ref. 2253

Dataset Overview

Dataset title

ParTree - Parallel Treebanks: A multilingual corpus of movie subtitles.

Canonical DOI

Used to cite the entire dataset, regardless of version updates.

https://doi.org/10.48656/zjzp-gj69

DOI

Used to cite a specific dataset version.

https://doi.org/10.48656/5mz4-x435

Dataset description language

English

Data URL

-

Data Availability

-

Dataset Description

A multilingual corpus of movie subtitles aligned on the sentence-level. Contains data on more than 50 languages with a focus on the Indo-European language family. Morphosyntactic annotation (part-of-speech, features, dependencies) in Universal Dependency-style is available for 47 languages.

Remarks about the documentation

-

Version number

1.0

Embargo end date

31.12.2023

Publication date

21.03.2023

Version notes

-

Bibliographical citation

Ebert, C., Levshina, N., & Widmer, P. (2023). ParTree - Parallel Treebanks: A multilingual corpus of movie subtitles. (Version 1.0.0) [Data set]. LaRS - Language Repository of Switzerland. https://doi.org/10.48656/5mz4-x435

DIP MD5 hash

4ad59f799658261db407bf7c62c92422

Dataset contents

swissubase_2253_1_0.zip
documentation.pdf