papers that could be used:

nlp-layers

Transformers without Tears: Improving the Normalization of Self-Attention

Self-attention Does Not Need Memory

pretrained embeddings

Language Modelling with Pixels

fairness & interpretability

Fairness in Representation for Multilingual NLP: Insights from Controlled Experiments on Conditional Language Modeling

Post-hoc Interpretability for Neural NLP: A Survey

Scaling Laws and Interpretability of Learning from Repeated Data

search

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Compositional Attention: Disentangling Search and Retrieval

Are We There Yet? A Decision Framework for Replacing Term Based Retrieval with Dense Retrieval Systems

NER

Automated Concatenation of Embeddings for Structured Prediction

A Knowledge-based System for Multilingual Named Entity Recognition

Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study

A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition

Boundary Smoothing for Named Entity Recognition

pruning

Block Pruning For Faster Transformers

multimodal

Natural Language Descriptions of Deep Visual Features

OCR-free Document Understanding Transformer

applications

Automated Crossword Solving

A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks and Datasets

Large Language Model

OPT: Open Pre-trained Transformer Language Models

Transfer Learning

Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

General papers

Papers that are more general and not limited to nlp. (or even focused on only other tasks)

Towards a Unified View of Parameter-Efficient Transfer Learning

Learning Pruning-Friendly Networks via Frank-Wolfe: One-Shot, Any-Sparsity, And No Retraining

8-bit Optimizers via Block-wise Quantization

RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation

StyleAlign: Analysis and Applications of Aligned StyleGAN Models

Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt