-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d50f2ce
commit 08c3fcd
Showing
47 changed files
with
719 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Aalto Finnish Parliament ASR Corpus 2008-2020", | ||
"URL": "http://urn.fi/urn:nbn:fi:lb-2022052002", | ||
"Family": "Parliamentary corpora", | ||
"Description": "This corpus, which consists of both audio recordings and transcriptions, is extracted from the Finnish parliamentary plenary session transcripts and videos by the Aalto Speech Recognition group. The original session transcripts and videos are available on the websites of the Parliament of Finland (see here and here). The corpus is split into three parts:\n<ol>\n<li>the 2015–2020 set</li>\n<li>the 2008–2016 set</li>\n<li>development and test sets</li>\n</ol>\nThe corpus is available for download from the Language Bank of Finland.", | ||
"Languages": ["fin"], | ||
"License": "CLARIN PUB", | ||
"Size": ["119.3 million words", "3,130 hours of recordings"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "http://urn.fi/urn:nbn:fi:lb-2022052003" | ||
}, | ||
"Publication": "" | ||
} |
15 changes: 15 additions & 0 deletions
15
corpora/parliamentary-corpora/archives-parlementaires.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Archives Parlementaires", | ||
"URL": "https://sul-philologic.stanford.edu/philologic/archparl/", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The Archives parlementaires is a chronologically-ordered edited collection of sources on the French Revolution. It was conceived in the mid 19th century as a project to produce a definitive record of parliamentary deliberations and also includes letters, reports, speeches, and other first-hand accounts from a great variety of published and archival sources. FRDA currently contains the AP volumes covering the years 1787-1794, which can be searched using ARTFL's PhiloLogic 4 open source software platform. The texts have been marked up using TEI so that speakers, places, dates, and terms in the published index can be easily found. Users can see both scanned images of the AP pages or just the texts. ", | ||
"Languages": ["fra"], | ||
"License": "", | ||
"Size": [], | ||
"Annotation": [], | ||
"Infrastructure": "Other", | ||
"Access": { | ||
"Concordancer": "https://sul-philologic.stanford.edu/philologic/archparl/" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Parliamentary Debates on Europe at the Assemblée nationale (2002-2012)", | ||
"URL": "https://hdl.handle.net/11403/fr-parl/v1", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The corpus contains French parliamentary debates from 2002 to 2012. The contextual metadata in the corpus concern the dates of the council meetings, the description of the main topic(s) of the European council meeting, the place where the European Council meeting took place; they also correspond to information about the government and the legislative session. The speaker metadata correspond to name, gender, occupation, parliamentary group, political orientation and the opposition and majority division.\nThe corpus is available for download from Ortolang.", | ||
"Languages": ["fra"], | ||
"License": "CC-BY", | ||
"Size": ["137,000 tokens"], | ||
"Annotation": ["contextual and speaker metadata"], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/11403/fr-parl/v1" | ||
}, | ||
"Publication": "Truan and Romary (2021)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Korpusbasierte Analyse österreichischer Parlamentsreden", | ||
"URL": "https://homepages.uni-regensburg.de/~sic07430/", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The corpus contains Austrian parliamentary debates from 2013 to 2015. It is annotated with the <a href=\"https://nlp.stanford.edu/software/tagger.shtml\">Stanford Tagger</a>.\nThe corpus currently is not available.", | ||
"Languages": ["German (Austrian)"], | ||
"License": "", | ||
"Size": ["1.2 million tokens"], | ||
"Annotation": ["tokenised", "PoS-tagged"], | ||
"Infrastructure": "Other", | ||
"Access": { | ||
}, | ||
"Publication": "Sippl et al. (2016)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Corpus of Bulgarian Political and Journalistic Speech", | ||
"URL": "http://www.political.webclark.org/?locale=bg", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The corpus contains Bulgarian parliamentary debates from 2006 to 2012.\nThe corpus is available through a dedicated concordancer.", | ||
"Languages": ["bul"], | ||
"License": "", | ||
"Size": ["10 million tokens"], | ||
"Annotation": ["tokenised", "PoS-tagged", "lemmatised"], | ||
"Infrastructure": "Other", | ||
"Access": { | ||
"Concordancer": "http://www.political.webclark.org/?locale=bg" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Parliamentary Debates on Europe at the Bundestag (1998-2015)", | ||
"URL": "https://hdl.handle.net/11403/de-parl/v1", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The corpus contains German parliamentary debates from 1998 to 2015. The contextual metadata in the corpus concern the dates of the council meetings, the description of the main topic(s) of the European council meeting, the place where the European Council meeting took place; they also correspond to information about the government and the legislative session. The speaker metadata correspond to name, gender, occupation, parliamentary group, political orientation and the opposition and majority division.\nThe corpus is available for download from Ortolang.", | ||
"Languages": ["deu"], | ||
"License": "CC-BY", | ||
"Size": ["417,000 tokens"], | ||
"Annotation": ["contextual and speaker metadata"], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/11403/de-parl/v1" | ||
}, | ||
"Publication": "Truan and Romary (2021)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "The Chinese/English Political Interpreting Corpus (CEPIC)", | ||
"URL": "https://digital.lib.hkbu.edu.hk/cepic/", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The CEPIC consists of transcripts of speeches delivered by top political figures from Hong Kong, Beijing, Washington DC and London, as well as their translated/interpreted texts.\nThe main speech types of CEPIC include the reading of government reports such as policy addresses and budget speeches, Q&A at press conferences, parliamentary debates, as well as remarks delivered at bilateral meetings.\nThe corpus features a parallel display of up to six versions of the same speech segment, aligned at paragraph level.\nThe corpus is available for online querying through a dedicated concordancer.", | ||
"Languages": ["zho", "eng"], | ||
"License": "<a href=\"https://digital.lib.hkbu.edu.hk/cepic/terms.php\">Terms of Use</a>", | ||
"Size": ["6.5 million words"], | ||
"Annotation": ["PoS-tagged", "prosodic and paralinguistic features"], | ||
"Infrastructure": "Other", | ||
"Access": { | ||
"Concordancer": "https://digital.lib.hkbu.edu.hk/cepic/search.php" | ||
}, | ||
"Publication": "Pan (2019)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Czech Parliamentary Meetings", | ||
"URL": "http://hdl.handle.net/11858/00-097C-0000-0005-CF9C-4", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The corpus contains recordings of the parliamentary sessions as well as corresponding transcriptions.\nThe corpus is available for download from LINDAT and through the concordancer KonText.", | ||
"Languages": ["ces"], | ||
"License": "CC-BY", | ||
"Size": ["88 hours", "0.5 million tokens"], | ||
"Annotation": ["error correction of transcriptions", "division into speech sections with speaker information"], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Concordancer": "http://lindat.mff.cuni.cz/services/kontext/first_form?corpname=czechparl_2012_03_28_cs_w", | ||
"Download": "http://hdl.handle.net/11858/00-097C-0000-0005-CF9C-4" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "CzechParl", | ||
"URL": "https://www.muni.cz/en/research/publications/914268", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The corpus contains Czech parliamentary debates from 1993 to 2010. It is annotated with <a href=\"https://nlp.fi.muni.cz/projekty/ajka/\">ajka</a>.\nThe corpus is available through the Sketch Engine.", | ||
"Languages": ["ces"], | ||
"License": "", | ||
"Size": ["81.9 million tokens"], | ||
"Annotation": ["tokenised", "MSD-tagged and lemmatised"], | ||
"Infrastructure": "Other", | ||
"Access": { | ||
"Concordancer": "https://the.sketchengine.co.uk/login/?next=%2Fcorpus%2Ffirst_form%3Fcorpname%3Dpreloaded%2Fczechparl2012%3B" | ||
}, | ||
"Publication": "Jakubíček and Kovář (2010)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "The Danish Parliament Corpus 2009 - 2017, v2", | ||
"URL": "http://hdl.handle.net/20.500.12115/44", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The corpus contains Danish parliamentary debates from 2009 to 2017.\nThe corpus is available for download from the DK-CLARIN repository.", | ||
"Languages": ["dan"], | ||
"License": "CC-BY", | ||
"Size": ["40.6 million words"], | ||
"Annotation": ["no linguistic annotation"], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/20.500.12115/44" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "DutchParl", | ||
"URL": "http://search.politicalmashup.nl/about.html", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The corpus contains Dutch parliamentary debates from 1814 to 2014. It is annotated with <a href=\"https://github.com/proycon/python-frog\">Frog</a>. See also the <a href=\"http://schema.politicalmashup.nl/\">information on the schema</a> used.\nThe corpus is available for download (the authors needs to be contacted) and is also accessible online through the Political Mashup environment.", | ||
"Languages": ["nld"], | ||
"License": "", | ||
"Size": ["800 million tokens"], | ||
"Annotation": ["tokenised", "PoS-tagged", "lemmatised"], | ||
"Infrastructure": "Other", | ||
"Access": { | ||
"Concordancer": "http://search.politicalmashup.nl/", | ||
"Download": "http://data.politicalmashup.nl/permanent/" | ||
}, | ||
"Publication": "Marx and Schuth (2010)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "EPIC-UdS", | ||
"URL": "http://hdl.handle.net/21.11119/0000-0008-F519-8", | ||
"Family": "Parliamentary corpora", | ||
"Description": "This is a parallel and comparable corpus of speeches held in the European Parliament; the corpus follows the European Parliament Interpreting Corpora tradition of the <a href=\"http://hdl.handle.net/11585/132580\">EPIC</a> and <a href=\"https://research.flw.ugent.be/en/projects/epicg-european-parliament-interpreting-corpus-ghent-step-further\">EPICG</a> corpora. It contains original speeches from 2008 to 2013 by English, German, and Spanish native speakers and their interpretation (English to and from German; Spanish to English).\nAll transcripts in the corpus are based on videos of the <a href=\"https://www.europarl.europa.eu/plenary/en/debates-video.html\">European Parliament Proceedings</a> published by the European Parliament.\nAnnotation includes typical characteristics of spoken language such as false starts, hesitations and truncated words. To obtain better results for source-target alignment as well as sentence parsing the transcripts were segmented using a main clause approach: compound sentences were segmented separately. For the second version of the corpus, the transcripts were processed clause by clause with the <a href=\"https://spacy.io/\">spaCy</a> NLP tools; the data is encoded in CoNLL-U and provides universal PoS tags, fine-grained language-specific PoS tags as well as <a href=\"https://universaldependencies.org/\">Universal Dependency</a> syntactic relations. All data was enriched with relevant metadata such as source language, name of original speaker, speech timing, mode of delivery and delivery rate.\nThe corpus is available for download from CLARIN-D (Saarland University B-centre).", | ||
"Languages": ["eng", "deu", "spa"], | ||
"License": "CC BY-NC-SA 4.0", | ||
"Size": ["350,000 tokens", "20,000 sentences"], | ||
"Annotation": ["tokenised", "PoS-tagged", "syntactically parsed", "speech phenomena"], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/21.11119/0000-0008-F519-8" | ||
}, | ||
"Publication": "Przybyl et al. (2022)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "European Parliament Proceedings Parallel Corpus 1996-2011, parallel corpus Greek-English", | ||
"URL": "http://hdl.grnet.gr/11500/ATHENA-0000-0000-23DE-F", | ||
"Family": "Parliamentary corpora", | ||
"Description": "This corpus is a bilingual Greek-English subset of the <a href=\"https://www.clarin.eu/resource-families/parliamentary-corpora#Europarl\">Europal parallel corpus</a>.\nThe corpus is available for download from the CLARIN:EL repository.", | ||
"Languages": ["Greek-English"], | ||
"License": "CC ZERO", | ||
"Size": ["31.9 million words (English)", "1.2 million sentences (Greek)"], | ||
"Annotation": ["sentence aligned"], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "http://hdl.grnet.gr/11500/ATHENA-0000-0000-23DE-F" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Europarl: European Parliament Proceedings Parallel Corpus 1996-2011", | ||
"URL": "https://www.statmt.org/europarl/", | ||
"Family": "Parliamentary corpora", | ||
"Description": "This corpus contains parliamentary debates from the European Parliament from 1996 to 2011.\nThe corpus is available for download from a dedicated webpage.", | ||
"Languages": ["21 languages"], | ||
"License": "CC0", | ||
"Size": ["33.7 million tokens"], | ||
"Annotation": ["sentence/aligned"], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "http://www.statmt.org/europarl/" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "German Political Speeches Corpus", | ||
"URL": "https://www.dwds.de/d/korpora/politische_reden", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The corpus contains speeches by 200 important political figures for the period between 1982 and 2020.\nA large part of the corpus contains speeches by the holders of the four highest German state offices: the Federal President, the Federal Chancellor, the President of the Bundestag and Foreign Ministers with terms of offie between 1982 and 2020.\nThe corpus is available for online browsing through the DWDS platform and a subset encoded in XML with 6,685 speeches until 2019 can be downloaded.", | ||
"Languages": ["deu"], | ||
"License": "CC BY-SA 4.0", | ||
"Size": ["15,240 speeches", "27 million texts"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Concordancer": "https://www.dwds.de/r?corpus=politische_reden", | ||
"Download": "http://adrien.barbaresi.eu/corpora/speeches/" | ||
}, | ||
"Publication": "Barbaresi (2018)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "German Parliamentary Corpus (GerParCor)", | ||
"URL": "https://github.com/texttechnologylab/GerParCor", | ||
"Family": "Parliamentary corpora", | ||
"Description": "This corpus contains (mostly historical) German-language parliamentary proceedings from the 19th, 20th, and 21th centuries, including state and federal-level data. Additionally, the corpus contains conversions of scanned protocols and, in particular, of protocols in <a href=\"https://en.wikipedia.org/wiki/Fraktur\">Fraktur</a> converted via an OCR process based on <a href=\"https://github.com/tesseract-ocr/tesseract\">Tesseract</a>. All protocols were preprocessed by means of the NLP pipeline <a href=\"https://spacy.io/usage/v3/\">spaCy v3</a> and automatically annotated with metadata regarding their session date. The corpus is made available in the XML format of the <a href=\"https://uima.apache.org/\">UIMA project</a>.\nThe corpus is available for download from GitHub.", | ||
"Languages": ["deu"], | ||
"License": "AGPL-3.0 Licence", | ||
"Size": [], | ||
"Annotation": ["tokenised", "PoS-tagged", "lemmatised", "sentence segmented", "NER-tagged", "morphology", "dependency parsed"], | ||
"Infrastructure": "Other", | ||
"Access": { | ||
"Download": "https://github.com/texttechnologylab/GerParCor" | ||
}, | ||
"Publication": "Abrami et al. (2022)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "HanDeSeT: Hansard Debates with Sentiment Tags", | ||
"URL": "https://data.mendeley.com/datasets/xsvp45cbt4/2", | ||
"Family": "Parliamentary corpora", | ||
"Description": "This corpus contains English parliamentary debates from 1997 to 2017.\nThe corpus is available for download from a dedicated webpage.", | ||
"Languages": ["eng"], | ||
"License": "Open Parliament Licence V3.0 and Open Data Commons Open Database License (OdbL)", | ||
"Size": ["1251 motion-speech units taken from 129 separate debates"], | ||
"Annotation": ["sentiment tags"], | ||
"Infrastructure": "Other", | ||
"Access": { | ||
"Download": "https://data.mendeley.com/datasets/xsvp45cbt4/2" | ||
}, | ||
"Publication": "Abercrombie and Batista-Navarro (2018)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Hansard corpus", | ||
"URL": "http://www.clarin.ac.uk/hansard-corpus", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The corpus contains British parliamentary debates from 1803 to 2005. It is semantically tagged with the <a href=\"http://ucrel.lancs.ac.uk/usas/\">USAS semantic tagger</a> and the <a href=\"http://www.sciencedirect.com/science/article/pii/S0885230816302121\">Historical Thesaurus Semantic Tagger</a> (HTST).\nThe corpus is available through a dedicated concordancer.", | ||
"Languages": ["eng"], | ||
"License": "", | ||
"Size": ["1.6 billion tokens"], | ||
"Annotation": ["tokenised", "PoS-tagged", "lemmatised", "semantic tagging"], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Concordancer": "https://www.hansard-corpus.org/x.asp" | ||
}, | ||
"Publication": "Rayson et al. (2015)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Hellenic Parliament Minutes (1989-1994, 1997-2018)", | ||
"URL": "http://hdl.grnet.gr/11500/AEGEAN-0000-0000-57FA-5", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The corpus contains Greek parliamentary debates for two periods: 1989-1994 and 1997-2018.\nThe corpus is available for download from the CLARIN:el repository.", | ||
"Languages": ["ell"], | ||
"License": "CC-BY-NC", | ||
"Size": ["181 million words"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "http://hdl.grnet.gr/11500/AEGEAN-0000-0000-57FA-5" | ||
}, | ||
"Publication": "" | ||
} |
15 changes: 15 additions & 0 deletions
15
corpora/parliamentary-corpora/house-of-commons-europe.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Parliamentary Debates on Europe at the House of Commons (1998-2015)", | ||
"URL": "https://hdl.handle.net/11403/uk-parl/v1", | ||
"Family": "Parliamentary corpora", | ||
"Description": "The corpus contains British parliamentary debates from 1998 to 2015. The contextual metadata in the corpus concern the dates of the council meetings, the description of the main topic(s) of the European council meeting, the place where the European Council meeting took place; they also correspond to information about the government and the legislative session. The speaker metadata correspond to name, gender, occupation, parliamentary group, political orientation and the opposition and majority division.\nThe corpus is available for download from Ortolang.", | ||
"Languages": ["eng"], | ||
"License": "CC-BY", | ||
"Size": ["190,000 tokens"], | ||
"Annotation": ["contextual and speaker metadata"], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/11403/uk-parl/v1" | ||
}, | ||
"Publication": "Truan and Romary (2021)" | ||
} |
Oops, something went wrong.