Skip to content

Commit

Permalink
Merge pull request #43 from entelecheia/main
Browse files Browse the repository at this point in the history
  • Loading branch information
entelecheia authored Sep 7, 2024
2 parents c553fd9 + b36daf6 commit d927921
Show file tree
Hide file tree
Showing 33 changed files with 4,963 additions and 167 deletions.
7 changes: 7 additions & 0 deletions book/en/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,17 @@ parts:
sections:
- file: week01/session1
- file: week01/session2
- file: week01/wk1-lab1
- file: week02/index
sections:
- file: week02/session1
- file: week02/session2
- file: week02/session3
- caption: Projects
chapters:
- file: projects/index
- file: projects/proposal
- file: projects/research-note
- caption: About
chapters:
- file: syllabus/index
Expand Down
79 changes: 79 additions & 0 deletions book/en/projects/research-note.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Week [n] Project Research Note

## Basic Information

- **Team Name**: [Enter team name]
- **Project Name**: [Enter project name]
- **Week**: Week [n]

## Team Member Activity Summary

| Name | Role | Key Activities | Next Week's Plan |
| ------- | ------ | -------------------------------- | ------------------------ |
| [Name1] | [Role] |[Activity1] <br> • [Activity2] |[Plan1] <br> • [Plan2] |
| [Name2] | [Role] |[Activity1] <br> • [Activity2] |[Plan1] <br> • [Plan2] |
| [Name3] | [Role] |[Activity1] <br> • [Activity2] |[Plan1] <br> • [Plan2] |
| [Name4] | [Role] |[Activity1] <br> • [Activity2] |[Plan1] <br> • [Plan2] |

## Weekly Goal Achievement

| Goal | Status | Notes |
| ------- | ------------------------------- | ------------------------ |
| [Goal1] | [Completed/In Progress/Delayed] | [Additional explanation] |
| [Goal2] | [Completed/In Progress/Delayed] | [Additional explanation] |
| [Goal3] | [Completed/In Progress/Delayed] | [Additional explanation] |

## Key Achievements and Deliverables

1. [Description of key achievement or deliverable 1]
2. [Description of key achievement or deliverable 2]
3. [Description of key achievement or deliverable 3]

## Technical Challenges and Solutions

1. **Challenge 1**: [Description of the challenge]
- Solution: [Description of the solution]
2. **Challenge 2**: [Description of the challenge]
- Solution: [Description of the solution]

## Learning Outcomes

1. [Learning topic 1]
- Key points: [Brief explanation]
- Application: [How it can be applied to the project]
2. [Learning topic 2]
- Key points: [Brief explanation]
- Application: [How it can be applied to the project]

## Next Week's Plan

1. [Plan 1]
2. [Plan 2]
3. [Plan 3]

## Other Notable Items

- [Notable item 1]
- [Notable item 2]

## Team Meeting Summary

- **Date and Time**: YYYY-MM-DD HH:MM
- **Attendees**: [List of attendees]
- **Key Discussion Points**:
1. [Discussion point 1]
2. [Discussion point 2]
3. [Discussion point 3]
- **Decisions Made**:
1. [Decision 1]
2. [Decision 2]

## Attachments

1. [Description and link to attachment 1]
2. [Description and link to attachment 2]

---

Date of Entry: YYYY-MM-DD
Logged by: [Name of logger]
2 changes: 1 addition & 1 deletion book/en/week01/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Week 1: Introduction
# Week 1 - Introduction

Welcome to Week 1 of our course on Natural Language Processing (NLP) and advanced language model technologies. This week, we'll embark on an exciting journey through the world of NLP, exploring its fundamental concepts, historical evolution, and cutting-edge developments.

Expand Down
14 changes: 1 addition & 13 deletions book/en/week01/session1.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Session 1 - Foundations and Evolution of NLP
# Week 1 Session 1 - Foundations and Evolution of NLP

## 1. Introduction to Natural Language Processing (NLP)

Expand Down Expand Up @@ -74,18 +74,6 @@ ORGANIZATION Google
GPE New York
```

### 1.3 Importance in Social Science Research

NLP has become increasingly important in social science research due to its ability to:

1. Analyze large-scale textual data, such as social media posts, historical documents, or survey responses
2. Extract insights from unstructured text, revealing patterns and trends in human communication
3. Automate content analysis and coding, saving time and reducing human bias in qualitative research
4. Facilitate cross-cultural and multilingual studies by enabling automated translation and analysis
5. Enhance sentiment analysis and opinion mining for understanding public perceptions and attitudes

Example: A researcher studying political discourse could use NLP techniques to analyze thousands of tweets during an election campaign, identifying key topics, sentiment towards candidates, and changes in public opinion over time.

## 2. Historical Perspective of NLP

```{mermaid}
Expand Down
2 changes: 1 addition & 1 deletion book/en/week01/session2.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Session 2 - The Revolution in Modern NLP
# Week 1 Session 2 - The Revolution in Modern NLP

## 6. Evolution Towards Modern NLP

Expand Down
390 changes: 390 additions & 0 deletions book/en/week01/wk1-lab1.ipynb

Large diffs are not rendered by default.

66 changes: 66 additions & 0 deletions book/en/week02/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Week 2 - Basics of Text Preprocessing

## Overview

This week, we'll dive into the fundamental techniques of text preprocessing, a crucial step in any Natural Language Processing (NLP) pipeline. Text preprocessing is essential for cleaning and standardizing raw text data, making it suitable for further analysis and model training.

## Learning Objectives

By the end of this week, you will be able to:

1. Understand the importance of text preprocessing in NLP tasks
2. Implement and apply various tokenization techniques
3. Perform text normalization, including case normalization and punctuation removal
4. Identify and remove stop words from text data
5. Use the NLTK (Natural Language Toolkit) library for text preprocessing tasks

## Key Topics

### 1. Tokenization

- Definition and importance of tokenization
- Word tokenization vs. sentence tokenization
- Challenges in tokenization (e.g., contractions, hyphenated words)
- Different tokenization approaches (rule-based, statistical, neural)

### 2. Normalization

- Case normalization (lowercasing/uppercasing)
- Punctuation removal
- Handling special characters and numbers
- Spelling correction and text canonicalization

### 3. Stop Word Removal

- Definition and purpose of stop words
- Common stop words in English
- Impact of stop word removal on NLP tasks
- Considerations for domain-specific stop words

### 4. NLTK Library for Text Preprocessing

- Introduction to NLTK
- Using NLTK for tokenization
- NLTK's built-in stop word lists
- Additional NLTK preprocessing utilities

## Practical Component

In this week's practical session, you will:

- Install and set up the NLTK library
- Implement a text preprocessing pipeline using NLTK
- Experiment with different tokenization methods
- Compare the effects of various preprocessing steps on sample texts

## Assignment

You will be given a dataset of raw text and tasked with creating a comprehensive preprocessing pipeline. Your solution should include tokenization, normalization, and stop word removal. You'll also need to provide a brief report discussing the impact of each preprocessing step on the resulting text.

## Looking Ahead

The text preprocessing skills you learn this week will form the foundation for more advanced NLP tasks we'll explore in the coming weeks. Next week, we'll build upon these basics to delve into the fundamentals of language models.

```{tableofcontents}
```
Loading

0 comments on commit d927921

Please sign in to comment.