This repository contains a full introductory course to CSS methods with Python. Teaching materials meet the criteria of a gradable university course, are fully online, self-explanatory, and freely available: the materials combine coding tutorials with recommended readings, specific teaching lessons, and experience-based guidelines; they are housed, here, in a public GitHub repository, which means, everybody can study them; they have the form of Jupyter Notebooks, which means, they have the look and feel of a manuscript, yet, they contain Python code that is fully executable in a browser window, potentially without the need to locally install Python; and they are available under a Creative Commons license which allows you to freely share and adapt them. The course consists of sessions that gradually lead participants to acquire more skills in Python.
The course consists of four sections. The first section teaches how to set up a computing infrastructure and conveys basic data management and scientific computing skills. The second section teaches students how to collect data using dedicated Python packages for using Application Programming Interfaces (APIs) and web scraping. The third section focuses on data preprocessing methods from network analysis and NLP and includes applications of Large Language Models (LLMs). The fourth section is about data analysis methods and goes into depth with network analysis and modeling, unsupervised and supervised ML, as well as topic modeling. Some datasets are repeatedly used throughout the course, among them a corpus of tweets on the topic of COVID (TweetsCOV19) from May 2020, social networks from the Copenhagen Networks Study (CNS), and the Varieties of Democracy (V-Dem) dataset on countries and principles of democracy. Whenever possible, sessions are interlinked and built on top of each other.
Read the syllabus here.
Notebooks are developed for the Anaconda distribution 2022.10 which can be downloaded here. For a complete guide how to set up your computing infrastructure and execute the course materials locally or in the cloud, please consult Session A1: Computing infrastructure. Or click on this button and execute the materials in the Binder cloud:
Section A: Introduction
- Session 1: Computing infrastructure (Ready for testing)
- Session 2: Data management and relational databases (Ready for testing)
- Session 3: Scientific computing and data visualization (Ready for testing)
Section B: Data collection methods
- Session 1: API harvesting
- Session 2: Data parsing and static web scraping
- Session 3: Dynamic web scraping
Section C: Data preprocessing methods
- Session 1: Network construction and visualization (Ready for testing)
- Session 2: Multilayer and multimodal network construction (Ready for testing)
- Session 3: Natural Language Processing (Ready for testing)
Section D: Data analysis methods
- Session 1: Micro-level network analysis and community detection (Ready for testing)
- Session 2: Macro-level network analysis and network modeling (Ready for testing)
- Session 3: Unsupervised machine learning
- Session 4: Topic modeling (Ready for testing)
- Session 5: Supervised machine learning
This course is initialized by 14 sessions, but more sessions can be added. For this, we invite contributions. Before starting to develop, you should contact us. When you develop, please use this template. Why should you contribute? Because science should be open, and we believe that this course has a future.
This course is edited by Şükrü Atsızelti, Haiko Lietz, and N. Gizem Bacaksizlar Turbic. Authors are Pouria Mirelmi, Olga Zagovora, and Nicolò Gozzi. Contact us here.
The initial 14 sessions have been developed as part of the Social ComQuant project which had been funded until 2023 as a twinning project among Koç University (Istanbul, Turkey), GESIS – Leibniz Institute for the Social Sciences (Cologne, Germany), and the ISI Foundation (Torino, Italy) under the European Commission's Horizon 2020 funding line.