Skip to content

Success prediction in flipped classrooms via clickstreams

License

Notifications You must be signed in to change notification settings

JulienVig/Flipped-Classroom

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Can Feature Predictive Power Generalize? Benchmarking Early Predictors of Student Success across Flipped and Online Courses

Personal notes

This fork's sole purpose is to reference the project within my personal Github page and state my technical contributions. In a few words, we improve student success prediction in flipped classrooms (lectures are watched at home while harder exercises are done in class) and compare features with existing methods for MOOCs (in which everything is done at home). I contributed mostly in implementing features from the litterature and testing them. A significant part of the implementation lies in the data processing and handling as it is composed on multiple time series per student, using videos, quizzes, problems or the forum. This repository displays my pandas and NumPy fluency as well as software engineering skills, working in collaboration using git, developing and testing following scientific rigor.

A typical example of my contributions is the weekly_prop_interrupted feature, which counts how may videos were left unfinished per week. Aside from the pandas implementation, I first had to gather all necessary data, such as which videos are supposed to be watched in a certain week. On one side I scraped the video schedule from the course platform, on the other I parsed the video platform contents in order to finally merge the information and find which videos were asked to be watched on which week.

Abstract

This repository is the official implementation of the EDM 2021 paper entitled "Can Feature Predictive Power Generalize? Benchmarking Early Predictors of Student Success across Flipped and Online Courses".

Our approach

Early predictors of student success are becoming a key tool in flipped and online courses to ensure that no student is left behind along course activities. However, with an increased interest in this area, it has become hard to keep track of what the state of the art in early success prediction is. Moreover, prior work on early success prediction based on clickstreams has mostly focused on implementing features and models for a specific online course (e.g., a MOOC). It remains therefore under-explored how different features and models enable early predictions, based on the domain, structure, and educational setting of a given course. In this paper, we report the results of a systematic analysis of early success predictors for both flipped and online courses. In the first part, we focus on a specific flipped course. Specifically, we investigate eight feature sets, presented at top-level educational venues over the last few years, and a novel feature set proposed in this paper and tailored to this setting. We benchmark the performance of these feature sets using a RF classifier, and we provide and discuss an ensemble feature set optimized for the target flipped course. In the second part, we extend our analysis to courses with different educational settings (i.e.,MOOCs), domains, and structure. Our results show that(i) the ensemble of optimal features varies depending on the course setting and structure, and (ii) the predictive performance of the optimal ensemble feature set depends on the course activities.

Requirements

The code has been tested under Python >= 3.6.

Install dependencies:

git clone https://github.com/d-vet-ml4ed/flipped-classroom.git
cd flipped-classroom
pip install -r requirements.txt

We recommend set PYTHONPATH before running the code:

export PYTHONPATH=${PYTHONPATH}:`pwd`

Data Preparation

By default, course grades, problem events, video events, and course schedules are expected to be stored in data/course/{flipped-classroom|mooc}/{courseplatform}.

mkdir -p data/course/{flipped-classroom|mooc}/{courseplatform}/grade
mkdir -p data/course/{flipped-classroom|mooc}/{courseplatform}/problem_event
mkdir -p data/course/{flipped-classroom|mooc}/{courseplatform}/video_event
mkdir -p data/course/{flipped-classroom|mooc}/{courseplatform}/schedule

In the folder data/course/{flipped-classroom|mooc}/{courseplatform}, a file metadata.csv [format: course_id, title, teachers, period, start_date, end_date, weeks, grade_thr, grade_max] needs to include course descriptive attributes.

For each course, we expect that the following data is provided in the corresponding subfolders below:

  • grade/{course_id}.csv [format: user_id, grade, date].
  • problem_event/{course_id}.csv [format: user_id, problem_id, event_type, timestamp, problem_type, grade, submission_number].
  • video_event/{course_id}.csv [format: user_id, video_id, event_type, timestamp, seek_type, old_time, current_time, new_time, old_speed, new_speed].
  • schedule/{course_id}.csv [format: id, type, chapter, subchapter, date, duration, grade_max].

Feature Extraction

To extract a set of features for a course, run this command:

python routine/compute_feature_set.py 
    --model "extractor.set.marras_et_al.MarrasEtAl" 
    --courses "{flipped-classroom|mooc}/{course_id}"
    --timeframe "{lq-week|eq-week}"
    --workdir "../data/result/edm21/feature/"

By default, features are expected to be stored in ./data/result/edm21/feature/.

Each course feature set is saved in a folder named as:

{lq-week|eq-week}-{feature_set}-{course_id}

For each feature set, in the above folder, the following files are stored:

  • feature_labels.csv [format: user_index, label-grade, label-pass-fail, label-dropout, label-stopout]
  • feature_values.npz [shape: (no_users, no_weeks, no_features)]
  • settings.txt

Training and Evaluation

To train and evaluate a predictor on a set of features, run this command:

python routine/train_predictor.py 
    --predictor "predictor.random_forest.RandomForest"
    --target_col "label-pass-fail"
    --feature_set "{lq-week|eq-week}-{feature_set}-{course_id}"
    --workdir "../data/result/edm21/"
    --folds 10

By default, models and evaluations are expected to be stored in ./data/result/edm21/.

Each predictor is saved in a folder named as:

{lq-week|eq-week}-{course_id}-{target_col}-{model}-{feature_set}-{aggregation_type}

For each predictor, in the above folder, the following files are stored:

  • params.txt
  • predictor-w{week_id}-f{fold_id}.h5 files
  • stats.csv [format: week, fold, y_train_idx, y_test_idx, auc, bal_acc, f1, acc_fail, acc_pass, bthr, ypred_proba, ypred, ytrue]

Contributing

This code is provided for educational purposes and aims to facilitate reproduction of our results, and further research in this direction. We have done our best to document, refactor, and test the code before publication.

If you find any bugs or would like to contribute new models, training protocols, etc, please let us know.

Please feel free to file issues and pull requests on the repo and we will address them as we can.

Citations

If you find this code useful in your work, please cite our paper:

Marras, M., Vignoud, J., Käser, T. (2021). 
Can Feature Predictive Power Generalize? Benchmarking Early Predictors of Student Success across Flipped and Online Courses. 
In: Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021). 

License

This code is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This software is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for details.

You should have received a copy of the GNU General Public License along with this source code. If not, go the following link: http://www.gnu.org/licenses/.

About

Success prediction in flipped classrooms via clickstreams

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%