Improving BERT Pretraining with Syntactic Supervision
Poster @ Learning with Small Data
September 11, 2023
A proof-of-concept work with promising results on embedding syntactic biases onto bidirectional transformers with no compute overhead.