Frida Hæstrup (201805753)
This is my personal repository with code and data related to my exam in the Spring 2021 module Language Analytics as part of the bachelor's tilvalg in Cultural Data Science at Aarhus University. The portfolio contains 5 projects:
Project | Description |
---|---|
1 | Keyword collocation across a text corpus |
2 | Network analysis of entities in documents |
3 | (Un)supervised machine learning |
4 | Text classification using Deep Learning |
5 | Topic Modelling on religious texts |
This repository has the following directory structure:
VisualAnalytics2021/
├── data/ #data folders for each project
│ └── project1/
│ └── project2/
│ └── project3/
│ └── project4/
│ └── project5/
├── src/ #Python scripts for each project
│ └── project1/
│ │ └── out/
│ │ └── collocation.py
│ └── project2/
│ │ └── output/
│ │ └── viz/
│ │ └── network.py
│ └── project3/
│ │ └── out/
│ │ └── LR_philosophicalTexts.py
│ └── project4/
│ │ └── out/
│ │ └── LR_GOT.py
│ │ └── DL_GOT.py
│ └── project5/
│ │ └── out/
│ │ └── religious_topics.py
├── utils/ #utility functions
│ └── *.py
├── figures/ #figures to use in READMEs
│ └── *.png
Scripts with code for each project can be found in the folder, src, along with a description of how to run them.
To run scripts within this repository, I recommend cloning the repository and installing relevant dependencies in a virtual ennvironment:
$ git clone https://github.com/frillecode/LanguageAnalytics2021
$ cd LanguageAnalytics2021
$ bash ./create_venv.sh #use create_venv_win.sh for windows
If you run into issues with some libraries/modules not being installed correctly when creating the virtual environment, install these manually by running the following:
$ cd LanguageAnalytics2021
$ source cds-lang/bin/activate
$ pip install {module_name}
$ deactivate
This project is licensed under the MIT License - see the LICENSE file for details.
Credits for utility scripts and the original repository structure goes to Ross Deans Kristensen-McLachlan.