AGILE2023-Semantic-complexity-GeoAnQu

Python/R source codes and datasets used for the submission to the AGILE2023 conference.

Requirements

It is strongly advised to run this code on Windows 10. The underlying third-party Python libraries known to have issues with Linux and MacOS.
Python 3.9.7. The instructions are for the Conda environment on Windows.
R 4.2.2 running with the RStudio IDE.

Content structure

"main.py": run this Python code to apply the transformation parser on the five corpora. See the section below on how to run this code.
"geo_question_parser_haiqi": this folder contains the transformation parser library.
"inputCorpora": this folder contains the five corpora.
- "[corpus name[.txt": a table listing all questions of the corpora. The column "Question" lists the original questions in the corpora. The column "RQuestion" lists the revised questions on which the transformation parser is applied.
- "[corpus name]_missing.json": manually created transformations that were not correctly parsed by the transformation parser
"outputData": all output from "main.py" is stored inside this folder.
- "[corpus name[.json": contains concepts ("types" json object) and transformations ("transformations" json object) for each question in the corpus.
- "[corpus name[_ParserStats_r.json": contains descriptive statistics about concepts and transformations of each question
  - "Question": question string
  - "qTypesCount": number of concepts identified in the question (size of "types" json object)
  - "qTransCount": number of transformations identified in the question (size of "transformations" json object)
  - "qOutputType": goal concept of the question
  - "[concept name]": 1 if the concept appears in the question or 0 otherwise
"statsScript.R": R code for statistical analysis of the output inside the folder "outputData". See the section below on how to run this code.

Running the Python code "main.py"

Setting up the conda python environment:

1.Install the 64-Bit version of Miniconda 4.10.3 from (https://repo.anaconda.com/miniconda/) (Windows,MaxOS and Linux).

2.Open a new window of anaconda prompt. Create a new conda environment with a name “qparser”.

conda create -n qparser python=3.9.7

3.Activate the new environment:

conda activate qparser

4.Install allennlp package

pip install allennlp

5.Install allennlp-models package

pip install allennlp-models

6.Install spacy package from conda-forge

conda install -c conda-forge spacy

7.Install spacy trained pipeline. If the installation throws error, then try executing the command again.

python -m spacy download en_core_web_sm

8.Install other packages from conda-forge:

conda install -c conda-forge antlr4-python3-runtime=4.9.3 word2number pyzmq nltk

9.Install nltk modules:

python -m nltk.downloader averaged_perceptron_tagger

python -m nltk.downloader omw-1.4

10.Optionally, it may be necessary to install the checklist package:

pip install checklist

Running "main.py"

In the anaconda prompt, perform following steps:

1.Navigate to the folder containing "main.py"

2.In Miniconda Prompt, navigate to the folder containing "main.py". For example, "cd C:\AGILE2023-Semantic-complexity-GeoAnQu-main".

3.Execute "main.py" with the "python" command.

Running the R code "statsScript.R"

It is recommended to execute the R code with the latest version of RStudio running on R version 4.2.2.

Before executing "statsScript.R", make sure to assign the variable "source" within the R code to the path of folder containing "statsScript.R".

All necessary packages will be automatically install upon the first execution of "statsScript.R".

Analysis results will be printed in the console of RStudio and visualized as plots.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AGILE2023-Semantic-complexity-GeoAnQu

Requirements

Content structure

Running the Python code "main.py"

Setting up the conda python environment:

Running "main.py"

Running the R code "statsScript.R"

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
geo_question_parser_haiqi		geo_question_parser_haiqi
inputCorpora		inputCorpora
outputData		outputData
README.md		README.md
main.py		main.py
statsScript.R		statsScript.R

reproducible-agile/AGILE2023-Semantic-complexity-GeoAnQu

Folders and files

Latest commit

History

Repository files navigation

AGILE2023-Semantic-complexity-GeoAnQu

Requirements

Content structure

Running the Python code "main.py"

Setting up the conda python environment:

Running "main.py"

Running the R code "statsScript.R"

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages