Part-of-Speech (POS) tagging

Background

Part-of-speech tagging is the task of assigning a part-of-speech tag (from a given tag set) to every word in a given sentence.

Example

Input:

快速 的 棕色 狐狸 跳过 了 懒惰 的 狗

Output:

[快速] VA [的] DEC [棕色] NN [狐狸] NN [跳过] VV [了] AS [懒惰] VA [的] DEC [狗] NN

Standard Metrics

F1 score calculated from word-level precision and word-level recall computed from the joint segmentation and tagging task.

Chinese Tree Bank Datasets.

Task originally defined in Ng and Low (2004)
Released by LDC. Requires LDC licence to acquire the datasets
Link: https://verbs.colorado.edu/chinese/ctb.html
Tag set has 33 POS tags

Test set	# words (dev)	# words (test)	Genre
CTB5	6,821	8,008	News

Metrics

Implementation: https://github.com/yanshao9798/tagger/blob/master/evaluation.py

Results

System	F1 score
Tian el. al. (2020)	96.92
Meng et. al. (2019) (Glyce + BERT)	96.61
Meng et. al. (2019) (BERT)	96.06
Shao et. al. 2017	94.38

Resources

Train set	# words	Genre
CTB5	493,935	News

Universal Dependencies Datasets.

Available freely (GPL or equivalent licence)
https://universaldependencies.org/
Paper describing the dataset: Nivre et. al. (2016)
Tagset has 15 POS tags

Test set	# words (dev)	# words (test)	Genre
UD Chinese	12,663	12,012	Learner essays, news, spoken language, Wiki

Metrics

Implementation: https://github.com/yanshao9798/tagger/blob/master/evaluation.py

Results

System	F1 score
Meng et. al. (2019) (Glyce + BERT)	96.14
Tian el. al. (2020)	95.69
Meng et. al. (2019) (BERT)	94.79
Shao et. al. (2017)	89.75

Resources

Train set	# words	Genre
UD Chinese	98,608	Learner essays, news, spoken language, Wiki

Suggestions? Changes? Please send email to chinesenlp.xyz@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pos_tagging.md

pos_tagging.md

Part-of-Speech (POS) tagging

Background

Example

Standard Metrics

Chinese Tree Bank Datasets.

Metrics

Results

Resources

Universal Dependencies Datasets.

Metrics

Results

Resources

Files

pos_tagging.md

Latest commit

History

pos_tagging.md

File metadata and controls

Part-of-Speech (POS) tagging

Background

Example

Standard Metrics

Chinese Tree Bank Datasets.

Metrics

Results

Resources

Universal Dependencies Datasets.

Metrics

Results

Resources