-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
5131460
commit 78067be
Showing
97 changed files
with
1,555 additions
and
0 deletions.
There are no files selected for viewing
16 changes: 16 additions & 0 deletions
16
lexical-resources/language-models/albertina-pt-br-base.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Albertina PT-BR base", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000F-FF45-5", | ||
"Family": "Language Models", | ||
"Description": "This model is for Portuguese spoken in Brazil. It is based on the Transformer neural architecture and is developed over the <a href=\"https://huggingface.co/docs/transformers/model_doc/deberta\">DeBERTa model</a>. ", | ||
"Language": ["por"], | ||
"Licence": "MIT", | ||
"Size": [], | ||
"Annotation": ["Baseline"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Baseline", | ||
"Access": { | ||
"Download": "https://huggingface.co/PORTULAN/albertina-ptbr-base" | ||
}, | ||
"Publication": "" | ||
} |
16 changes: 16 additions & 0 deletions
16
lexical-resources/language-models/albertina-pt-br-no-brwac.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Albertina PT-BR No-brWaC", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000F-FF46-4 ", | ||
"Family": "Language Models", | ||
"Description": "This is a model for Portuguese spoken in Brazil trained on adta sets othan than brWaC. It is I developed over the <a href=\"https://huggingface.co/docs/transformers/model_doc/deberta\">DeBERTa model</a>.\nThe model is available for download from Hugging Face.", | ||
"Language": ["por"], | ||
"Licence": "MIT", | ||
"Size": [], | ||
"Annotation": ["Baseline"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Baseline", | ||
"Access": { | ||
"Download": "https://huggingface.co/PORTULAN/albertina-ptbr-nobrwac" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Albertina PT-BR", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000F-FF43-7 ", | ||
"Family": "Language Models", | ||
"Description": "This model is an encoder of the BERT family and is based on the neural architecture Transformer and developed over the <a href=\"https://huggingface.co/docs/transformers/model_doc/deberta\">DeBERTa</a> model. This model is for American Portuguese spoken in Brazil, is trained on the <a href=\"https://huggingface.co/datasets/brwac\">brWaC</a> dataset, and is a larger version of the <a href=\"https://hdl.handle.net/21.11129/0000-000F-FF45-5\">Albertina PT-BR</a> base model.\nThis model is available for download through Hugging Face.", | ||
"Language": ["por"], | ||
"Licence": "MIT", | ||
"Size": [], | ||
"Annotation": ["Baseline"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Baseline", | ||
"Access": { | ||
"Download": "https://huggingface.co/PORTULAN/albertina-ptbr" | ||
}, | ||
"Publication": "" | ||
} |
16 changes: 16 additions & 0 deletions
16
lexical-resources/language-models/albertina-pt-pt-base.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Albertina PT-PT base", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000F-FF44-6", | ||
"Family": "Language Models", | ||
"Description": "This model is for European. It is based on the Transformer neural architecture and is developed over the <a href=\"https://huggingface.co/docs/transformers/model_doc/deberta\">DeBERTa model</a>.\nThis model is available for download through Hugging Face.", | ||
"Language": ["por"], | ||
"Licence": "MIT", | ||
"Size": [], | ||
"Annotation": ["Baseline"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Baseline", | ||
"Access": { | ||
"Download": "https://huggingface.co/PORTULAN/albertina-ptpt-base" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Albertina PT-PT", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000F-FF42-8", | ||
"Family": "Language Models", | ||
"Description": "This model is an encoder of the BERT family and is based on the neural architecture Transformer and developed over the <a href=\"https://huggingface.co/docs/transformers/model_doc/deberta\">DeBERTa</a> model. This model is for European Portuguese and is trained on the <a href=\"https://huggingface.co/datasets/brwac\">brWaC</a> dataset, and is a larger version of the <a href=\"https://hdl.handle.net/21.11129/0000-000F-FF45-6\">Albertina PT-PT</a> base model.\nThis model is available for download through Hugging Face.", | ||
"Language": ["por"], | ||
"Licence": "MIT", | ||
"Size": [], | ||
"Annotation": ["Baseline"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Baseline", | ||
"Access": { | ||
"Download": "https://huggingface.co/PORTULAN/albertina-ptpt" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "BERTimbau - Portuguese BERT-Base language model", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000E-6726-4", | ||
"Family": "Language Models", | ||
"Description": "This is a <a href=\"https://github.com/google-research/bert\">BERT</a> model, trained on <a href=\"https://www.inf.ufrgs.br/pln/wiki/index.php?title=BrWaC#Current_version\">BrWaC</a> (Brazilian Web as Corpus), a large Portuguese corpus, for 1,000,000 steps, using whole-word mask.\nThe model is available for download from the PORTULAN repository.", | ||
"Language": ["por"], | ||
"Licence": "Under negotiation", | ||
"Size": [], | ||
"Annotation": ["Baseline"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Baseline", | ||
"Access": { | ||
"Download": "https://huggingface.co/PORTULAN/gervasio-ptpt" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "BERTimbau - Portuguese BERT-Large language model", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000E-6725-5", | ||
"Family": "Language Models", | ||
"Description": "This is a <a href=\"https://github.com/google-research/bert\">BERT</a> model, trained on <a href=\"https://www.inf.ufrgs.br/pln/wiki/index.php?title=BrWaC#Current_version\">BrWaC</a> (Brazilian Web as Corpus), a large Portuguese corpus, for 1,000,000 steps, using whole-word mask.\nThe model is available for download from the PORTULAN repository.", | ||
"Language": ["por"], | ||
"Licence": "Under negotiation", | ||
"Size": [], | ||
"Annotation": ["Baseline"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Baseline", | ||
"Access": { | ||
"Download": "https://github.com/neuralmind-ai/portuguese-bert/" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "ccGigafida ARPA language model 1.0", | ||
"URL": "http://hdl.handle.net/11356/1119", | ||
"Family": "Language Models", | ||
"Description": "This model was created from the <a href=\"http://hdl.handle.net/11356/1035\">ccGigafida written corpus of Slovenian</a> using the <a href=\"https://github.com/kpu/kenlm\">KenLM algorithm</a> in the <a href=\"http://www2.statmt.org/moses/\">Moses machine translation framework</a>. It is a general language model of contemporary standard Slovenian language that can be used as a language model in statistical machine translation systems.\nThe model is available for download from the CLARIN.SI repository.", | ||
"Language": ["slv"], | ||
"Licence": "CC BY 4.0", | ||
"Size": [], | ||
"Annotation": ["Baseline"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Baseline", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1119" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "CERED baseline models", | ||
"URL": "http://hdl.handle.net/11234/1-3266", | ||
"Family": "Language Models", | ||
"Description": "These models are trained on <a href=\"http://hdl.handle.net/11234/1-3265\">CERED</a>, a dataset created by distant supervision on Czech Wikipedia and Wikidata, and recognize a subset of Wikidata relations.\nThe model is available for download from the LINDAT repository.", | ||
"Language": ["ces"], | ||
"Licence": "CC BY-NC-SA 4.0", | ||
"Size": [], | ||
"Annotation": ["Baseline"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Baseline", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11234/1-3266" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{ | ||
"Name": "Word embeddings CLARIN.SI-embed", | ||
"URL": "http://hdl.handle.net/11356/1796", | ||
"Family": "Language Models", | ||
"Description": "This is a set of word embeddings for 5 languages.<ul><li>CLARIN.SI-embed.bg contains word embeddings for Bulgarian induced from the MaCoCu-bg web crawl corpus. The embeddings are based on the skip-gram model of fastText trained on 4,120,343,820 tokens of running text for 2,746,640 lowercased surface forms.</li><li>CLARIN.SI-embed.hr contains word embeddings induced from a large collection of Croatian texts composed of the Croatian web corpus hrWaC, a 400-million-token-heavy collection of newspaper texts and MaCoCu-hr. The embeddings are based on the skip-gram model of fastText trained on 4,586,769,197 tokens of running text for 3,406,574 lowercased surface forms.</li><li>CLARIN.SI-embed.mk contains word embeddings induced from a large collection of Macedonian texts crawled from the .mk top-level domain. The embeddings are based on the skip-gram model of fastText trained on 933,231,582 tokens of running text for 986,670 lowercased surface forms. </li><li>CLARIN.SI-embed.sr contains word embeddings induced from the srWaC and MaCoCu-sr web corpora. The embeddings are based on the skip-gram model of fastText trained on 3,434,602,575 tokens of running text for 2,676,036 lowercased surface forms. </li><li>CLARIN.SI-embed.sl contains word embeddings induced from a large collection of Slovene texts composed of existing corpora of Slovene, e.g GigaFida, Janes, KAS, slWaC, MaCoCu-sl, etc. The embeddings are based on the skip-gram model of fastText trained on 5,791,405,942 tokens of running text for 3,471,054 lowercased surface forms.</li></ul>\nThe models are available for download from the CLARIN.SI repository.", | ||
"Language": ["bul", "hrv", "mkd", "srp", "slv"], | ||
"Licence": "CC BY-SA 4.0", | ||
"Size": [], | ||
"Annotation": ["word embeddings"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Contextual Word Embeddings", | ||
"Access": { | ||
"Download (Bulgarian)": "http://hdl.handle.net/11356/1796", | ||
"Download (Croatian)": "http://hdl.handle.net/11356/1790", | ||
"Download (Macedonian)": "http://hdl.handle.net/11356/1788", | ||
"Download (Serbian)": "http://hdl.handle.net/11356/1789", | ||
"Download (Slovenian)": "http://hdl.handle.net/11356/1791" | ||
}, | ||
"Publication": "" | ||
} |
16 changes: 16 additions & 0 deletions
16
lexical-resources/language-models/classla-stanford-lemma-slv.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 2.0", | ||
"URL": "http://hdl.handle.net/11356/1768", | ||
"Family": "Language Models", | ||
"Description": "The model for lemmatisation of standard Slovenian was built with the <a href=\"https://github.com/clarinsi/classla\">CLASSLA-Stanza tool</a> by training on the <a href=\"http://hdl.handle.net/11356/1747\">SUK training corpus</a> and using the <a href=\"http://hdl.handle.net/11356/1204\">CLARIN.SI-embed.sl word embeddings</a> expanded with the <a href=\"http://hdl.handle.net/11356/1517\">MaCoCu-sl Slovene web corpus</a>. The estimated F1 of the lemma annotations is ~99.7.\nThe model is available for download from the CLARIN.SI repository.", | ||
"Language": ["slv"], | ||
"Licence": "CC BY-SA 4.0", | ||
"Size": [], | ||
"Annotation": ["lemmatisation"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Lemmatisation", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1768" | ||
}, | ||
"Publication": "Ljubešić and Dobrovoljc (2019)" | ||
} |
16 changes: 16 additions & 0 deletions
16
lexical-resources/language-models/classla-stanford-ner-bul.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The CLASSLA-StanfordNLP model for named entity recognition of standard Bulgarian 1.0", | ||
"URL": "http://hdl.handle.net/11356/1329", | ||
"Family": "Language Models", | ||
"Description": "This model for named entity recognition of standard Bulgarian was built with the <a href=\"https://github.com/clarinsi/classla-stanfordnlp\">CLASSLA-StanfordNLP tool</a> by training on the <a href=\"http://hdl.handle.net/11495/D93F-C6E9-65D9-2\">BulTreeBank training corpus</a> and using the <a href=\"http://hdl.handle.net/11234/1-1989\">CoNLL2017 word embeddings</a>.\nThe model is available for download from the CLARIN.SI repository.", | ||
"Language": ["bul"], | ||
"Licence": "CC BY-SA 4.0", | ||
"Size": [], | ||
"Annotation": ["named entity recognition"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Named Entity Recognition", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1329" | ||
}, | ||
"Publication": "Ljubešić and Dobrovoljc (2019)" | ||
} |
16 changes: 16 additions & 0 deletions
16
lexical-resources/language-models/classla-stanford-ner-hrv.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The CLASSLA-StanfordNLP model for named entity recognition of standard Croatian 1.0", | ||
"URL": "http://hdl.handle.net/11356/1322", | ||
"Family": "Language Models", | ||
"Description": "This model for named entity recognition of standard Croatian was built with the <a href=\"https://github.com/clarinsi/classla-stanfordnlp\">CLASSLA-StanfordNLP tool</a> by training on the <a href=\"http://hdl.handle.net/11356/1183\">hr500k training corpus</a> and using the <a href=\"http://hdl.handle.net/11356/1205\">CLARIN.SI-embed.hr word embeddings</a>.\nThe model is available for download from the CLARIN.SI repository.", | ||
"Language": ["hrv"], | ||
"Licence": "CC BY-SA 4.0", | ||
"Size": [], | ||
"Annotation": ["named entity recognition"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Named Entity Recognition", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1322" | ||
}, | ||
"Publication": "Ljubešić and Dobrovoljc (2019)" | ||
} |
16 changes: 16 additions & 0 deletions
16
lexical-resources/language-models/classla-stanford-ner-non-std-hrv.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The CLASSLA-StanfordNLP model for named entity recognition of non-standard Croatian 1.0", | ||
"URL": "http://hdl.handle.net/11356/1340", | ||
"Family": "Language Models", | ||
"Description": "This model for named entity recognition of non-standard Croatian was built with the <a href=\"https://github.com/clarinsi/classla-stanfordnlp\">CLASSLA-StanfordNLP tool</a> by training on the <a href=\"http://hdl.handle.net/11356/1183\">hr500k training corpus</a>, the <a href=\"http://hdl.handle.net/11356/1241\">ReLDI-NormTagNER-hr</a> corpus and the <a href=\"http://hdl.handle.net/11356/1240\">ReLDI-NormTagNER-sr corpus</a>, using the <a href=\"http://hdl.handle.net/11356/1205\">CLARIN.SI-embed.hr word embeddings</a> . The training corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed.\nThe model is available for download from the CLARIN.SI repository.", | ||
"Language": ["Croatian (non-standard)"], | ||
"Licence": "CC BY-SA 4.0", | ||
"Size": [], | ||
"Annotation": ["named entity recognition"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Named Entity Recognition", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1340" | ||
}, | ||
"Publication": "Ljubešić and Dobrovoljc (2019)" | ||
} |
16 changes: 16 additions & 0 deletions
16
lexical-resources/language-models/classla-stanford-ner-non-std-slv.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The CLASSLA-StanfordNLP model for named entity recognition of non-standard Slovenian 1.0", | ||
"URL": "http://hdl.handle.net/11356/1339", | ||
"Family": "Language Models", | ||
"Description": "This model for named entity recognition of non-standard Slovenian was built with the <a href=\"https://github.com/clarinsi/classla-stanfordnlp\">CLASSLA-StanfordNLP tool</a> by training on the <a href=\"http://hdl.handle.net/11356/1210\">ssj500k training corpus</a> and the <a href=\"http://hdl.handle.net/11356/1238\">Janes-Tag training corpus</a>, using the <a href=\"http://hdl.handle.net/11356/1204\">CLARIN.SI-embed.sl word embeddings</a>. The training corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed.\nThe model is available for download from the CLARIN.SI repository.", | ||
"Language": ["Slovenian (non-standard)"], | ||
"Licence": "CC BY-SA 4.0", | ||
"Size": [], | ||
"Annotation": ["named entity recognition"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Named Entity Recognition", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1339" | ||
}, | ||
"Publication": "Ljubešić and Dobrovoljc (2019)" | ||
} |
16 changes: 16 additions & 0 deletions
16
lexical-resources/language-models/classla-stanford-ner-non-std-srp.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The CLASSLA-StanfordNLP model for named entity recognition of non-standard Serbian 1.0", | ||
"URL": "http://hdl.handle.net/11356/1341", | ||
"Family": "Language Models", | ||
"Description": "This model for named entity recognition of non-standard Serbian was built with the <a href=\"https://github.com/clarinsi/classla-stanfordnlp\">CLASSLA-StanfordNLP tool</a> by training on the <a href=\"http://hdl.handle.net/11356/1200\">SETimes.SR training corpus/a>, the <a href=\"http://hdl.handle.net/11356/1183\">hr500k training corpus</a>, the <a href=\"http://hdl.handle.net/11356/1240\">ReLDI-NormTagNER-sr corpus</a>, and the <a href=\"http://hdl.handle.net/11356/1241\">ReLDI-NormTagNER-hr corpus</a>, using the <a href=\"http://hdl.handle.net/11356/1206\">CLARIN.SI-embed.sr word embeddings</a>. The training corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed.\nThe model is available for download from the CLARIN.SI repository.", | ||
"Language": ["Serbian (non-standard)"], | ||
"Licence": "CC BY-SA 4.0", | ||
"Size": [], | ||
"Annotation": ["named entity recognition"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Named Entity Recognition", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1341" | ||
}, | ||
"Publication": "Ljubešić and Dobrovoljc (2019)" | ||
} |
16 changes: 16 additions & 0 deletions
16
lexical-resources/language-models/classla-stanford-ner-slv.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The CLASSLA-StanfordNLP model for named entity recognition of standard Slovenian 1.0", | ||
"URL": "http://hdl.handle.net/11356/1321", | ||
"Family": "Language Models", | ||
"Description": "This model for named entity recognition of standard Slovenian was built with the <a href=\"https://github.com/clarinsi/classla-stanfordnlp\">CLASSLA-StanfordNLP tool</a> by training on the <a href=\"http://hdl.handle.net/11356/1210\">ssj500k training corpus</a> and using the <a href=\"http://hdl.handle.net/11356/1204\">CLARIN.SI-embed.sl word embeddings</a>.\nThe model is available for download from the CLARIN.SI repository.", | ||
"Language": ["slv"], | ||
"Licence": "CC BY-SA 4.0", | ||
"Size": [], | ||
"Annotation": ["named entity recognition"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Named Entity Recognition", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1321" | ||
}, | ||
"Publication": "Ljubešić and Dobrovoljc (2019)" | ||
} |
16 changes: 16 additions & 0 deletions
16
lexical-resources/language-models/classla-stanford-ner-srp.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The CLASSLA-StanfordNLP model for named entity recognition of standard Serbian 1.0", | ||
"URL": "http://hdl.handle.net/11356/1323", | ||
"Family": "Language Models", | ||
"Description": "This model for named entity recognition of standard Serbian was built with the <a href=\"https://github.com/clarinsi/classla-stanfordnlp\">CLASSLA-StanfordNLP tool</a> by training on the <a href=\"http://hdl.handle.net/11356/1200\">SETimes.SR training corpus</a> and using the <a href=\"http://hdl.handle.net/11356/1206\">CLARIN.SI-embed.sr word embeddings</a>.\nThe model is available for download from the CLARIN.SI repository.", | ||
"Language": ["srp"], | ||
"Licence": "CC BY-SA 4.0", | ||
"Size": [], | ||
"Annotation": ["named entity recognition"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Named Entity Recognition", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1323" | ||
}, | ||
"Publication": "Ljubešić and Dobrovoljc (2019)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The CLASSLA-Stanza model for morphosyntactic annotation of standard Bulgarian 2.1", | ||
"URL": "http://hdl.handle.net/11356/1849", | ||
"Family": "Language Models", | ||
"Description": "The model for morphosyntactic annotation of standard Bulgarian was built with the <a href=\"https://github.com/clarinsi/classla\">CLASSLA-Stanza tool</a> by training on the <a href=\"https://clarino.uib.no/korpuskel/corpora\">BulTreeBank training corpus</a> and using the <a href=\"http://hdl.handle.net/11356/1796\">CLARIN.SI-embed.bg word embeddings</a>. The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~96.83.\nThe model is available for download from the CLARIN.SI repository.", | ||
"Language": ["bul"], | ||
"Licence": "CC BY-SA 4.0", | ||
"Size": [], | ||
"Annotation": ["morphosyntax"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Morphosyntax", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1849" | ||
}, | ||
"Publication": "Ljubešić and Dobrovoljc (2019)" | ||
} |
Oops, something went wrong.