Skip to content

Commit

Permalink
added corpus query tools
Browse files Browse the repository at this point in the history
  • Loading branch information
kreetrapper committed Sep 14, 2024
1 parent ef1d1dd commit b6b9eba
Show file tree
Hide file tree
Showing 95 changed files with 1,532 additions and 0 deletions.
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/aconcorde.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "aConCorde",
"URL": "https://www.andy-roberts.net/coding/aconcorde",
"Family": "Corpus query tools",
"Description": "This is a multi-lingual concordance tool. Originally developed for native Arabic concordance, it posses basic concordance functionality, as well as English and Arabic interfaces.",
"Functionality": ["Concordancing/querying"]
"Languages": ["Language independent"],
"License": "No licence",
"Size": [],
"Platform": ["Platform-independent (java)"],
"Infrastructure": "CLARIN",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/antconc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "Antconc",
"URL": "https://www.laurenceanthony.net/software/antconc/",
"Family": "Corpus query tools",
"Description": "This is a freeware corpus analysis toolkit for concordancing and text analysis.\nOnline videos and manuals from the creator and community (<a href=\"https://groups.google.com/g/AntConc\">Google Group</a>).",
"Functionality": ["Concordancing/querying"]
"Languages": ["Language independent"],
"License": "<a href=\"https://www.laurenceanthony.net/software/antconc/releases/AntConc4011/license.pdf\">Proprietary</a>",
"Size": [],
"Platform": ["Linux", "MacOS", "Windows"],
"Infrastructure": "External",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/antpconc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "AntPConc",
"URL": "http://www.laurenceanthony.net/software/antpconc/",
"Family": "Corpus query tools",
"Description": "This is a freeware parallel corpus analysis toolkit for concordancing and text analysis using UTF-8 encoded text files.",
"Functionality": ["Parallel Concordancing/querying"]
"Languages": ["Language independent"],
"License": "",
"Size": [],
"Platform": ["Linux", "MacOS", "Windows"],
"Infrastructure": "External",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/autosearch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "AutoSearch",
"URL": "https://autosearch.ivdnt.org/",
"Family": "Corpus query tools",
"Description": "This tool allows users to upload corpora annotated at the token level for (extended) part of speech, lemma and word form in FoLiA or TEI format, after which the corpus can be searched for these properties with a Corpus of Contemporary Dutch-like interface",
"Functionality": ["Querying/concordancing", "corpus upload and analysis"]
"Languages": ["Language independent"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "CLARIAH-NL",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/bncweb-lancaster.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "BNCweb (Lancaster)",
"URL": "http://bncweb.lancs.ac.uk/bncwebSignup/user/login.php",
"Family": "Corpus query tools",
"Description": "This tool is a modified version of <a href=\"https://cqpweb.lancs.ac.uk/\">CQPweb</a> for the <a href=\"http://www.natcorp.ox.ac.uk/\">British National Corpus</a>. It allows a number of search options: publication date, text medium, author gender, target audience, genre, author domicile.\nRegistration is required to use the tool.",
"Functionality": ["Querying/concordancing"]
"Languages": ["eng"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "CLARIN-UK",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/casualconc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "CasualConc",
"URL": "https://sites.google.com/site/casualconc/",
"Family": "Corpus query tools",
"Description": "This is a concordance program that runs natively on macOS 11.3 or later.and can generate KWIC concordance lines, word clusters, collocation analysis, and word count.",
"Functionality": ["Concordancing/querying"]
"Languages": ["Language independent"],
"License": "No licence",
"Size": [],
"Platform": ["MacOS"],
"Infrastructure": "External",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/catma.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "CATMA",
"URL": "http://catma.de/",
"Family": "Corpus query tools",
"Description": "The acronym CATMA stands for Computer Assisted Text Markup and Analysis.\nIt is possible to upload one's own corpus with this tool.",
"Functionality": ["Querying/concordancing", "corpus upload and analysis"]
"Languages": ["deu"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "External",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/chn.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "Concordancer of Corpus Hedendaags Nederlands (Corpus of Contemporary Dutch)",
"URL": "https://chn.ivdnt.org/",
"Family": "Corpus query tools",
"Description": "This is a dedicated query tool, built on <a href=\"http://inl.github.io/BlackLab/\">BlackLab</a> software, for <a href=\"https://chn.ivdnt.org/\">Corpus Hedendaags Nederlands (Corpus of Contemporary Dutch)</a>, a corpus of more than 800,000 texts taken from newspapers, magazines, news broadcasts and legal writings (1814–2013).\nThe corpus is a combination of the 5, 27 and 38 million word corpora and the PAROLE Corpus, supplemented with newspaper texts from NRC and De Standaard (until 2013).\nRegistration is required for using this tool. Shibboleth log-in is supported.",
"Functionality": ["Querying/concordancing"]
"Languages": ["nld"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "External",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/cintil.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "CINTIL Concordancer",
"URL": "https://portulanclarin.net/workbench/cintil-concordancer/",
"Family": "Corpus query tools",
"Description": "This is a freely available online concordancing service to support the research usage of the <a href=\"http://cintil.ul.pt/cintilfeatures.html#corpus\">CINTIL Corpus</a>. The CINTIL concordancer allows the use of patterns to specify the occurrences to be retrieved. This permits to uncover linguistic structures of high complexity and use this service as a powerful research tool.",
"Functionality": ["Querying/concordancing"]
"Languages": ["por"],
"License": "<a href=\"https://portulanclarin.net/workbench/license/\">Proprietary</a>",
"Size": [],
"Platform": [],
"Infrastructure": "PORTULAN CLARIN",
"Access": {
"Browse": ""
},
"Publication": "Barreto et al. (2006)"
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/clan.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "CLAN",
"URL": "http://dali.talkbank.org/clan/",
"Family": "Corpus query tools",
"Description": "The CLAN Programs are downloaded, installed, and used as a single application. Functionally, however, CLAN has two parts. The first part is the CLAN editor which can be used to edit files in either CHAT or CA (Conversation Analysis) format. The editor also provides a wide range of additional functions, such as audio and video playback, linkage to audio and video, fonts for Roman and non-Roman orthographies, data validation, adding codes to files, and shipping data to other programs. The second part of CLAN is the set of data analysis programs. These programs are run from a separate window called the Commands window. The results of the analytic programs are sent to the CLAN Output window.\nThe tool is only compatible with <a href=\"https://www.talkbank.org/\">TalkBank</a> corpora that have CHAT annotation.\nAn <a href=\"https://talkbank.org/manuals/CLAN.pdf\">online manual</a> is available.",
"Functionality": ["Concordancing/querying"]
"Languages": ["Language independent"],
"License": "GPL2 (source code)",
"Size": [],
"Platform": ["Windows", "MacOS", "Source code provided for Linux users"],
"Infrastructure": "TalkBank",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/clark.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "CLaRK",
"URL": "http://bultreebank.org/bg/clark/",
"Family": "Corpus query tools",
"Description": "This tool is an XML-based system for corpus linguistics, primarily for corpus construction, but also with functionality for analysing and exploring corpora.\nThe support team is reachable through <a href=\"mailto:[email protected]\">email</a>. A <a href=\"http://bultreebank.org/en/clark/clark-system-online-manual\">user manual</a> is also available.",
"Functionality": ["Concordancing/querying", "corpus building"]
"Languages": ["Language independent"],
"License": "",
"Size": [],
"Platform": ["Platform-independent"],
"Infrastructure": "CLARIN-BG",
"Access": {
"Browse": ""
},
"Publication": "Simov et al. (2014)"
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/clic.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "CLiC",
"URL": "https://clic.bham.ac.uk/",
"Family": "Corpus query tools",
"Description": "This tool has been developed as part of the <a href=\"https://clic.bham.ac.uk/\">CLiC Dickens project</a>, which demonstrates through corpus stylistics how computer-assisted methods can be used to study literary texts and lead to new insights into how readers perceive fictional characters. Further literary texts have been added to the online service.\nTechnical support is offered through <a href=\"mailto:[email protected]\">email</a>.",
"Functionality": ["Querying/concordancing"]
"Languages": ["eng"],
"License": "Use of CLiC follows the University of Birmingham’s legal policy",
"Size": [],
"Platform": [],
"Infrastructure": "CLARIN-UK",
"Access": {
"Browse": ""
},
"Publication": "Mahlberg et al. (2020)"
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/coanzse.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "CoANZSE Audio",
"URL": "https://coanzse.org/landing/index.html",
"Family": "Corpus query tools",
"Description": "This is a dedicated concordancer for the <a href=\"https://cc.oulu.fi/~scoats/CoANZSE.html\">Corpus of Australian and New Zealand Spoken English</a>.\nThe corpus contains 195 million words of geolocated automatic speech recognition transcripts of video content from local governments in Australia and New Zealand, created for the study of lexical, grammatical, phonetic, and discourse-pragmatic phenomena of spoken language. Additionally, the corpus contains complete textual content of the corpus, audio files and forced alignments in <a href=\"https://www.fon.hum.uva.nl/praat/manual/TextGrid.html\">Praat's TextGrid</a> format for most transcripts.\nThe corpus can be accessed through the <a href=\"https://www.clarin.eu/content/service-provider-federation\">CLARIN Service Provider Federation</a>.",
"Functionality": ["Querying/concordancing"]
"Languages": ["eng"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "External",
"Access": {
"Browse": ""
},
"Publication": "Coats (2022)"
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/collocate.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "Collocate",
"URL": "https://collocationary.com/",
"Family": "Corpus query tools",
"Description": "This tool is a Windows software program that can be used to find collocations or terms in a corpus. It is a commercial tool.",
"Functionality": ["Concordancing/querying"]
"Languages": ["Language independent"],
"License": "No licence",
"Size": [],
"Platform": ["Windows"],
"Infrastructure": "External",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/compleat.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "Compleat Lexical Tutor",
"URL": "https://www.lextutor.ca/",
"Family": "Corpus query tools",
"Description": "This tool includes a concordancer, vocabulary profiler, exercise maker, interactive exercises, and much more.\nIt is possible to upload one's own corpus with this tool (10 MB limit",
"Functionality": ["Querying/concordancing", "corpus upload and analysis"]
"Languages": ["eng", "fra"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "External",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/concgram.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "ConcGram",
"URL": "https://benjamins.com/catalog/cls.1",
"Family": "Corpus query tools",
"Description": "This tool is a corpus linguistics software package which is specifically designed to find all the co-occurrences of words in a text or corpus irrespective of variation. This is a commercial tool, available for purchase on optical disc.",
"Functionality": ["Concordancing/querying"]
"Languages": ["Language independent"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "External",
"Access": {
"Browse": ""
},
"Publication": "Greaves (2009)"
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/concordancer-espanol.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "Concordancer of the Corpus del Español",
"URL": "https://www.corpusdelespanol.org/",
"Family": "Corpus query tools",
"Description": "This is a querying tool for the corpora from Corpus del Español, which provide billions of words of recent data from 21 Spanish-speaking countries. There are <a href=\"https://www.corpusdelespanol.org/\">four different corpora</a> in the Corpus del Español.",
"Functionality": ["Querying/concordancing"]
"Languages": ["spa"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "External",
"Access": {
"Browse": ""
},
"Publication": "English Corpora"
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/concordancer-estonian.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "Concordancer of the Text Corpus of the Institute of the Estonian Language",
"URL": "https://portaal.eki.ee/corpus",
"Family": "Corpus query tools",
"Description": "This tool provides a simple interface for a text corpus. The material for the text corpus has been collected haphazardly, 10.4 million word forms. Approximately 80% of the texts come from newspapers, which is why the corpus is not representative. The corpus also is not tagged, thus being suited for lexical search mainly.",
"Functionality": ["Querying/concordancing"]
"Languages": ["est"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "CELR",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/concordancer-gysseling.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "Concordancer of Corpus Gysseling",
"URL": "https://corpusgysseling.ivdnt.org/corpus-frontend/Gysseling/search/",
"Family": "Corpus query tools",
"Description": "This is a dedicated query tool for the <a href=\"https://corpusgysseling.ivdnt.org/corpus-frontend/Gysseling/about\">Corpus Gysseling</a>, developed by the Instituut voor de Nederlandse Taal. The backend of the application is the <a href=\"http://inl.github.io/BlackLab/\">BlackLab Lucene</a>-based search engine developed for corpora with token-based annotation. The web-based frontend is a further development of the <a href=\"https://github.com/INL/corpus-frontend\">corpus-frontend</a> application developed by INT in CLARIN and CLARIAH projects.",
"Functionality": ["Querying/concordancing"]
"Languages": ["nld"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "CLARIAH-NL",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/concordancer-hr-nat-corp.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "Concordancer of the Croatian National Corpus",
"URL": "http://filip.ffzg.hr/",
"Family": "Corpus query tools",
"Description": "This is an implementation of NoSketchEngine for the <a href=\"http://hdl.handle.net/11372/LRT-233\">Croatian National Corpus</a>.",
"Functionality": ["Querying/concordancing"]
"Languages": ["hrv"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "CLARIN-HR",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/concordancer-italian-heritage.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "Concordancer of the Italian Corpus for the dissemination of culture and the enhancement of the Italian literary heritage",
"URL": "http://dbtvm1.ilc.cnr.it/sitodbt/",
"Family": "Corpus query tools",
"Description": "This tool allows text and corpora querying, supporting both basic information retrieval and advanced search. It allows the customization of the query system functionalities and provides indexing also for morpho-syntactically annotated texts. The system can handle several type of text annotations and make concordances also for parallel bilingual corpora.",
"Functionality": ["Querying/concordancing (non-parallel and parallel)"]
"Languages": ["Language independent"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "CLARIN-IT",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/concordancer-middelnederlands.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "Concordancer of Corpus Middelnederlands",
"URL": "https://corpusmiddelnederlands.ivdnt.org/corpus-frontend/MNL/search/",
"Family": "Corpus query tools",
"Description": "This is a dedicated query tool for the <a href=\"https://corpusmiddelnederlands.ivdnt.org/corpus-frontend/MNL/about\">Corpus Middelnederlands</a>.",
"Functionality": ["Querying/concordancing"]
"Languages": ["nld"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "CLARIAH-NL",
"Access": {
"Browse": ""
},
"Publication": ""
}
16 changes: 16 additions & 0 deletions tools/corpus-query-tools/concordancer-portuguese.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"Name": "Concordancer of O corpus do português",
"URL": "https://www.corpusdoportugues.org/",
"Family": "Corpus query tools",
"Description": "This is a dedicated concordancer for the Corpus of Portuguese developed by Mark Davies.",
"Functionality": ["Querying/concordancing"]
"Languages": ["por"],
"License": "",
"Size": [],
"Platform": [],
"Infrastructure": "External",
"Access": {
"Browse": ""
},
"Publication": "publications"
}
Loading

0 comments on commit b6b9eba

Please sign in to comment.