NER-for-Chinese-social-media

This is a term project which investigates the named entity recognition task of Chinese social media dataset. It is inspired by the paper "Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism" (P. Cao et al., 2018). Various models and techniques of deep learning are employed in the process, including LSTMs, CNNs, self-attention, transformer and more. The dataset can be accessed at the official repository. The word embeddings are based on the word2vec pertained versions provided by the paper's Github page, with additional preprocessing and refinement including different positional encoding techniques and combinations with Chinese word segmentation task.

The project primarily uses PyTorch for model construction. Various implementations and experiments are carried out to compare with the performance proposed in the original paper. The results are documented in report.pdf. The preprocessing, training, and evaluation scripts are also added here for further reference. The learned model is included as .pth file inside Releases. It can be implemented by running train_combined.py with train set to False, and the evaluations will carry out on both development set and test set.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSE		LICENSE
README.md		README.md
model_combined.py		model_combined.py
preprocess_em.py		preprocess_em.py
report.pdf		report.pdf
train_combined.py		train_combined.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NER-for-Chinese-social-media

About

Releases 1

Packages

Languages

License

ck44liu/NER-for-Chinese-social-media

Folders and files

Latest commit

History

Repository files navigation

NER-for-Chinese-social-media

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages