- AI, But Simple
- Posts
- BERT: Bidirectional Encoder Representations From Transformers
BERT: Bidirectional Encoder Representations From Transformers
AI, But Simple Issue #46

Hello from the AI, but simple team! If you enjoy our content, consider supporting us so we can keep doing what we do.
Our newsletter is no longer sustainable to run at no cost, so we’re relying on different measures to cover operational expenses. Thanks again for reading!
BERT: Bidirectional Encoder Representations From Transformers
AI, But Simple Issue #46
In Partnership With:

The Bidirectional Encoder Representations From Transformers (BERT) model is a transformer model that fundamentally transformed the field of natural language processing (NLP), becoming a foundational model for a wide range of NLP tasks.
BERT was revealed in Google’s groundbreaking 2018 paper, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” which has been cited over 120,000 times.

BERT’s introduction incited the development of countless additional transformer-based models (such as RoBERTa, GPT-2, or T5), leading to a wave of innovations in NLP that continues to influence research and applications today.
BERT introduced a novel approach with its bidirectional transformers. Before BERT, most models processed text in one direction: either left-to-right or right-to-left.
You’ve heard the hype. It’s time for results.
For all the buzz around agentic AI, most companies still aren't seeing results. But that's about to change. See real agentic workflows in action, hear success stories from our beta testers, and learn how to align your IT and business teams.
As a transformer, BERT uses the self-attention mechanism to weigh the significance of each word based on its context—both the context to the left and to the right of the word.
This bidirectional processing is what makes BERT so good at understanding context, better than standard unidirectional transformers.
Want to learn more about transformers and how they work? Check out our past issue here.
This is an oversimplification, but think of it this way—BERT reads and then re-reads the sentence to gain a deeper understanding of each word’s meaning. It generates a left and right context instead of just a left context of standard transformers.