Annotated Bibliography

This is a crowdsourced annotated bibliography of research and resources related to BERT-like models.

If you’d like to add to the bibliography, you can do so in this Dropbox document. We will update the bibliography on this web page periodically.

Technical Readings

“BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding,” Jacob Devlin, Ming-Wei Chan, Kenton Lee, and Kristina Toutonova, 2018.
- Original paper that introduced BERT, authored by Google AI developers
“Contextual Embeddings: When are They Worth It?” Simran Arora, Avner May, Jian Zhang, Christopher Ré, 2020.
“DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter.” Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf, 2020.
- Helpful for teaching students how to use BERT-like models without extensive computational resources
“A Primer in BERTology: What We Know About How BERT Works,” Anna Rogers, Olga Kovaleva and Anna Rumshisky, 2020.
- A survey of 150+ studies of BERT that explores what BERT “knows” and how it might be improved. Very technical and invested in model architecture

“The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning),” Jay Alammar, December 2018.
- Helpful but very technical for a humanities audience

“On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜,” Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell
- This paper discusses the risks and ethical concerns of large language models like BERT, including biased and poorly documented training data as well as financial and environmental costs
“Extracting Data From Large Language Models,” Nicholas Carlini, et al, December 2020.
“Privacy Considerations in Large Language Models” (Blog post), Nicholas Carlini, December 2020.

Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4 Kent Chang, Mackenzie Cramer, Sandeep Soni, David Bamman, EMNLP 2023. Code
“Do Humanists Need BERT?, “ Ted Underwood, July 2019.
- Overview of BERT and an assessment of its usefulness when applied to sentiment analysis of movie reviews and genre classification of books
“Literary Event Detection,” Matthew Sims, Jong Ho Park, and David Bamman, 2019.
“An Annotated Dataset of Coreference in English Literature,” David Bamman, Olivia Lewke, and Anya Mansoor.
“Latin BERT: A Contextual Language Model for Classical Philology,” David Bamman and Patrick Burns.
MacBERTh (BERT for Early Modern English), Lauren Fonteyn.
Unsupervised Domain Adaptation of Contextualized Embeddings for Labeling. Xiaochuang Han, Jacob Eisenstein, 2019.
- Domain adaptive fine-tuning on Early Modern English and Twitter
What about Grammar? Using BERT Embeddings to Explore Functional-Semantic Shifts of Semi-Lexical and Grammatical Constructions. Lauren Fonteyn, 2020.

“Playing With Unicorns: AI Dungeon and Citizen NLP,”Minh Hua and Rita Raley, Digital Humanities Quarterly, 2020.

“Using BERT for next sentence prediction,” Ted Underwood, adapted and used in Dan Sinykin’s Emory course “Practical Approaches to Data Science with Text,” 2020.