Lexical-complexity-measures-and-proficiency

Rossina Soyan, Fall 2021, [email protected]

Description

The goal of the project was to understand what lexical complexity measures correspond to intermediate and advanced proficiency levels in L2 Russian texts. To answer this question, I calculated three lexical complexity measures using a small sub-corpus of L2 Russian texts, performed a hierarchical cluster analysis and compared the results with the original proficiency levels. The goodness of clusters analysis showed that 5 out 8 students (62.5%) were categorized correctly. One reason for this subpar performance may be the size of the corpus which included only 8 students and 24 texts. Another and a more serious reason may be the lexical complexity measures themselves which were chosen based on studies with L2 English texts and which may not reflect proficiency levels in L2 Russian texts. This would be a fruitful area for further work.

Dataset

The dataset for this project was randomly chosen from the Middlebury corpus of L2 Russian texts. The Middlebury corpus is not currently publicly accessible but I worked as an RA during the compilation of this corpus, and the PI Dr. Olesya Kisselev gave me the permission to explore the corpus in my statistics courses as part of my coursework. The Middlebury corpus consists of essays written by students as part of placement (pre-test) and final examination (post-test) in the summer of 2019. The original corpus includes 601 essays (103,150 words total) by 133 Russian L2 learners at different levels of proficiency. The sub-corpus for this project includes 24 essays (4,854 words total) by 8 L2 Russian learners, with 4 students rated as intermediate and 4 students rated as advanced.

Repo directory

final_report.md overall description of the project, theoretical contextualization, analysis and the story behind the final product
README.md you are here
LICENSE.md licensing terms
project_plan.md my original plan before I started coding in R
progress_report.md four progress reports throughout the semester
presentation.pdf this is the pdf of the final presentation I gave at the end of the semester
final_code.Rmd the loading of the sub-corpus, calculation of LC measures and cluster analysis
final_code.md the same output but in the HTML format
data_sample this is an example of a text in the sub-corpus
non_lexical_items these are .txt files that I created to be able to calculate lexical density of the corpus texts
images images of the cluster analysis
scratchpad these are all the drafts I went through on my way to the final project

Guestbook

Link to the guestbook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lexical-complexity-measures-and-proficiency

Rossina Soyan, Fall 2021, [email protected]

Description

Dataset

Repo directory

Guestbook

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data_sample		data_sample
final_code_files/figure-gfm		final_code_files/figure-gfm
non_lexical_items_for_lex_density		non_lexical_items_for_lex_density
scratchpad		scratchpad
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Presentation_Final_Project.pdf		Presentation_Final_Project.pdf
README.md		README.md
final_code.Rmd		final_code.Rmd
final_code.md		final_code.md
final_report.md		final_report.md
progress_report.md		progress_report.md
project_plan.md		project_plan.md

License

Data-Sci-2021/Complexity-measures-and-proficiency

Folders and files

Latest commit

History

Repository files navigation

Lexical-complexity-measures-and-proficiency

Rossina Soyan, Fall 2021, [email protected]

Description

Dataset

Repo directory

Guestbook

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages