Kindly aims to end cyberbullying and make students feel safer, by leveraging the latest advances in technology and empowering children to solve this pressing challenge. Kindly is a collaboration between young innovator Gitanjali Rao, who conceived the initial proof of concept, and UNICEF, which is currently leading its development as a digital public good and exploring its implementation at scale.
Did you know that 1 in 3 children is bullied? According to the UNESCO Institute of Statistics, one third of the globeβs youth is bullied; this ranges from as low as 7% in Tajikistan to 74% in Samoa.
In the United States, 1 in 5 students get bullied. About 20% of students ages 12-18 experienced bullying in the United States1.
Bullying includes three core elements:
- unwanted behavior
- observed or perceived power imbalance
- repetition or high likelihood of repetition of bullying behaviors
Bullying affects all youth, including those who are bullied, those who bully others, and those who witness bullying. The effects of bullying may continue into adulthood. Among students ages 12-18 who reported being bullied at school during the school year in the United States, 15% were bullied online or by text1.
Within the next 3 months, we aim to:
- Create a public API in an open-source environment, making it easier and more accessible for other technical users to contribute to
- Develop it as a digital public good, opening up the training data sets and machine learning model to encourage wide collaboration
- Build an open-source community around it, empowering youth around the world to collaborate in tackling cyberbullying
- Raise greater awareness around cyberbullying
This project consists of three primary building blocks:
-
Python Training Server: This server rebuilds the model using data obtained from the Continuous Integration (CI) server. Because training data can get exponentially large, it requires ideally a GPU to perform this task quickly. However, it can start with a CPU if training data is relatively small (say 30 MB - 50 MB). The model file that is generated at the end of the process is stored on the Python API Server to be used in prediction on the frontend client. In the absence of the actual server, the current implementation uses a service called Huggingface.com, where large model files and repositories are hosted to be used by data scientists.
-
Python API Server: This server hosts the Machine Learning (ML) model and the initial Application Programming Interface (API) to check the text against. Only a single endpoint is provided currently.
-
NodeJS Client Server: This server hosts the frontend that users can interact with. Interacts with the Python API Server to check whether the input text is offensive or not, and reports the result back to the user. The client also sends training data from a form off to GitHub to be saved in the repository as a text file.
Additionally, it also leverages the following infrastructure:
-
GitHub hosts the source code of both the NodeJS Client Server and the Python API Server. It hosts the training data coming from the form on the Node JS Client website as text files (this seems like a convention in the ML field).
-
CI/CD Pipelines trigger the client website and the API to be rebuilt and deployed whenever code is updated in GitHub. It also passes on updated model training data text files and model updates to the Python Training server.
The interaction between the various components described above is illustrated in the diagram below:
Note: The flowchart has been created with MermaidJS. To update it, do the following:
- Click on the SVG image above, which will open a live editor with the current version (content is encoded in the URL)
- Once you have made the changes, click on
Actions
in the live editor, andCopy Markdown
- Update the Markdown above and commit the changes on this README.
Note: The roadmap has been created with MermaidJS. To update it, do the following:
- Click on the SVG image above, which will open a live editor with the current version (content is encoded in the URL)
- Once you have made the changes, click on
Actions
in the live editor, andCopy Markdown
- Update the Markdown above and commit the changes on this README.
- Refer to docs/development.md for information related to doing development with this repository.
- Refer to docs/deployment.md for information related to deploying the code in this repository into production.
- Refer to docs/api.md for information related to the API endpoints available in production.
The following projects have also leveraged different forms of technology to combat cyberbullying. These served as additional inspiration as we aimed to create an open-source, community-driven product.
- Perspective is a free API that uses machine learning to identify "toxic" comments. Perspective returns a percentage that represents the likelihood that someone will perceive the text as toxic. Perspecive requires users to register in order to access the API. It also requires users to have a Google account and Google Cloud project to authenticate API requests. Currently, there is no fee to use it but in the future, increases to QPS may incur a fee (Source).
- AS Tracking by STEER is an AI solution that compares the online psychological test results provided by students with its psychological model to flag which students may need more attention and support. This is a commercial product that aims to sell to various school groups.
- @dhavalpotdar created a project on GitHub to detect cyberbullying in tweets using ML Classification Algorithms. However, this project is not active as the last commit was made 2 years ago.
- Academic researchers from Ghent University, University of Antwerp, and University of Cape Town published a research paper focusing on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying (Van Hee, Cynthia et al. βAutomatic detection of cyberbullying in social media text.β PloS one vol. 13,10 e0203794. 8 Oct. 2018, doi:10.1371/journal.pone.0203794).
- An academic researcher from Tampere University published a research paper, proposing a model training scheme that can employ fairness constraints on cyberbullying detection models (O. Gencoglu, "Cyberbullying Detection With Fairness Constraints," in IEEE Internet Computing, vol. 25, no. 1, pp. 20-29, 1 Jan.-Feb. 2021, doi: 10.1109/MIC.2020.3032461).
UNICEF works in over 190 countries and territories to protect the rights of every child. UNICEF has spent more than 70 years working to improve the lives of children and their families. In UNICEF, we believe all children have a right to survive, thrive and fulfill their potential β to the benefit of a better world.
This repository contains both software and data, and a different license applies to each.
- Software included in this repository is licensed under the GNU Affero General Public License v3.0
- Data included in this repository is licensed under the Creative Commons Attribution-ShareAlike 4.0
This project was initially conceived by Gitanjali Rao as a proof of concept that included a Natural Language Processing (NLP) model developed using Microsoft Azure's Language Understanding (LUIS) cloud-based engine, an Application Programing Interface (API), and two front-ends: a Chrome browser extension and a standalone app. All the original materials can be found at https://kindly.godaddysites.com/
UNICEF current work on Kindly is a collaboration between multiple units:
- UNICEF Division of Communication, which highlighted Gitanjali's work through the Voices of Youth project: The future of internet safety, reimagined in June 2020 and manages communications and outreach or this initiative.
- UNICEF Office of Innovation that is currently leading the technical development
- UNICEF Information and Communication Technology Division (ICTD) that is supporting on the IT infrastructure and scale-up strategy
- UNICEF Child Protection, TBD