GitHub - Data-Sci-2021/L2-Prosody-Analysis: This is Miroo Lee's project repo for Data Science (LING 2340). The goal of this project is to investigate how prosody develops in L2 learners' speech by examining temporal modifications of phonetic segments near pauses.

L2-Prosody-Analysis

**Miroo Lee ([email protected]) 12-13-2021 **

1.Overview

This is Miroo Lee's project repo for Data Science (LING 2340). The goal of this project is to investigate how L2 learners' speech develop rhythmic properties of prosody by examining temporal modifications of phonetic segments as a function of lexical stress and domain-initial boundary lengthening.

2.Dataset

The data set I started my project with comes from the PELIC speech corpus from the University of Pittsburgh. The PELIC speech corpus is a large learner corpus, and the current project examined 2-minute semi-spontaneous monologues by Korean students. You can find more information about the corpus from here.

3.Contents

In addition to this README file, there are four folders and eleven other files.

In the root folder:

final_report.md describes the results of the data analysis.
README.md is the current document you are reading.
LICENSE describes the licensing term for the project.
.gitignore has git ignored file entries.
project_plan.md describes the initial plan for the project.
project_progress.md shows three progress reports throughout the semester.
presentation.pdf is the slides of the presentation I gave at the end of the semester. This presentation only included the preliminary data analysis. More detailed results are documented in final_report.md
search_wav.Rmd contains codes for identifying wav file names by filtering L1, level, and task type.
search_wav.md same as the above but in md file.
KOR_mono.csv is an output of search_wav.md. It is a list of two-minute monologue speech files of Korean speakers who were enrolled for three semesters.
KOR_mono_scripts.csv is another output of search_wav.md. It is a list of transcripts for the corresponding speech files.
export_from_three_tires.praat is a Praat script that compiles annotated information from multiple praat textgrids to a single txt file.
wordList.csv contains a list of words found in wav_SAMPLES. The list also contains syllable structure and lexical stress information of each word.
new_wordList.csv contains a list of words found in three wav files from the speaker ea4.
data_analysis.Rmd contains codes for data cleaning & analysis.
data_analysis.md same as the abobe but in md file.
plots has plots from data_analysis.Rmd.
scratchpad has codes I tried and documented for my project.
wav has 129 wav files identified on KOR_mono.csv.
wav_SAMPLES has subset of wav files which are annotated in textgrid files from wav.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

L2-Prosody-Analysis

1.Overview

2.Dataset

3.Contents

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
plots		plots
scratchpad		scratchpad
wav		wav
wav_SAMPLES		wav_SAMPLES
.gitignore		.gitignore
KOR_mono.csv		KOR_mono.csv
KOR_mono_scripts.csv		KOR_mono_scripts.csv
LICENSE		LICENSE
README.md		README.md
data_analysis.Rmd		data_analysis.Rmd
data_analysis.md		data_analysis.md
export_from_three_tiers.praat		export_from_three_tiers.praat
final_report.md		final_report.md
new_wordList.csv		new_wordList.csv
presentation.pdf		presentation.pdf
project_plan.md		project_plan.md
project_progress.md		project_progress.md
search_wav.Rmd		search_wav.Rmd
search_wav.md		search_wav.md
wordList.csv		wordList.csv

License

Data-Sci-2021/L2-Prosody-Analysis

Folders and files

Latest commit

History

Repository files navigation

L2-Prosody-Analysis

1.Overview

2.Dataset

3.Contents

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages