NLP Landscape swift

The NLP landscape has changed quite a bit in the past 1 year. The last time I was reading papers on NLP was probably around 2018 May. I remember reading google’s “Attention is all you need”. I even used it as a base model for a project and improved the performance on WMT’14 German <-> English translation task by a few BLUE points. The project report I wrote on that can be found in my repo: NMT for Morphologically rich languages

But, what I didn’t realize at that point is that almost all of NLP research and practitioners are going to shift away from gated recurrent models to fully attention-based models. While I have been passively looking at news articles on BERT and GPT, I think its time I understood them better. I am hoping to write a series of posts on the transformer based models to help my understanding by writing down.