AI-Generated Text Detection using BERT

Project in regard to the kaggle contest https://www.kaggle.com/competitions/llm-detect-ai-generated-text

Introduction

AI-Generated Text Detection using BERT is a project aimed at detecting AI-generated text segments within a given dataset. Leveraging the power of BERT (Bidirectional Encoder Representations from Transformers), the project addresses the challenge of distinguishing between genuine human-authored content and computer-generated text. By implementing advanced natural language processing techniques, the model contributes to enhancing cybersecurity and integrity in digital communications.

Work Flow

The project follows a structured workflow:

Data Preprocessing: Cleaning and preprocessing textual data to remove noise, stop words, punctuation, and non-alphabetic characters using BERT-preprocess.
Additional Datasets: Collecting various datasets from competitions and concatenating them to increase the training instances. This step enhances the model's ability to identify features and patterns effectively.
Model Training: Utilizing a BERT-based sequence classification model to train the system to distinguish between human and AI-generated text segments accurately.
Predictions: Generating predictions on test data to highlight potential AI-generated content segments.
Result Analysis: Saving the results in a CSV file for submission and further analysis.

Comprehensive Explanation on How BERT Detects AI-Generated Texts

The project includes an in-depth analysis of how BERT detects AI-generated texts, exploring various features, including semantic differences, vocabulary usage, statistical distributions, and sentiment analysis measures. The analysis delves into black-box detection algorithms for AI text detection, shedding light on the underlying mechanisms responsible for distinguishing between human and AI-generated content.

Edge Cases

The project addresses edge cases and potential anomalies in AI-generated text detection. Detailed explanations and possible solutions for edge cases are provided, enhancing the model's robustness and accuracy.

Notable Points

The project highlights notable points and findings, including observations on the differences between human-authored and AI-generated content. Insights from research papers and analysis provide valuable information for understanding and addressing challenges in AI text detection.

Result Summary

A summary of the project's results and findings is presented, including model performance, LB scores, and recommendations for further analysis. Insights into the effectiveness of different models and techniques contribute to advancing research in AI text detection.

References for Further Analysis

Various research papers and resources are referenced for further analysis and exploration of AI text detection. These references provide valuable insights and perspectives for continued research and development in the field.

Author

Kairvee Vaswani

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
approach		approach
code		code
files		files
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Generated Text Detection using BERT

Project in regard to the kaggle contest https://www.kaggle.com/competitions/llm-detect-ai-generated-text

Introduction

Work Flow

Comprehensive Explanation on How BERT Detects AI-Generated Texts

Edge Cases

Notable Points

Result Summary

References for Further Analysis

Author

About

Releases

Packages

Languages

kairveeehh/AI-text-detection

Folders and files

Latest commit

History

Repository files navigation

AI-Generated Text Detection using BERT

Project in regard to the kaggle contest https://www.kaggle.com/competitions/llm-detect-ai-generated-text

Introduction

Work Flow

Comprehensive Explanation on How BERT Detects AI-Generated Texts

Edge Cases

Notable Points

Result Summary

References for Further Analysis

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages