Improving BERT Pretraining with Syntactic Supervision

Poster @ Learning with Small Data
September 11, 2023


A proof-of-concept work with promising results on embedding syntactic biases onto bidirectional transformers with no compute overhead.