Image by author

The mathematical trick to speed up transformers

A few weeks ago researchers from Google, the University of Cambridge, DeepMind and the Alan Turing Institute released the paper Rethinking Attention with Performers, which seeks to find a solution to the softmax bottleneck problem in transformers [1]. Their approach exploits a clever mathematical trick, which I will explain in this article.


Topics covered:

Why transformers?

In essence, the Transformer is a model designed to work efficiently with sequential data, and it is in fact employed heavily in Natural Language Processing (NLP) tasks, which require…

Why does minimizing the norms induce regularization?

Image by author

If you’ve taken an introductory Machine Learning class, you’ve certainly come across the issue of overfitting and been introduced to the concept of regularization and norm. I often see this being discussed purely by looking at the formulas, so I figured I’d try to give a better insight into why exactly minimising the norm induces regularization — and how L1 and L2 differ from each other — using some visual examples.

Prerequisite knowledge

Topics covered

Recap of regularization


Evaluating NLP models using the weighted branching factor

In this post I will give a detailed overview of perplexity as it is used in Natural Language Processing (NLP), covering the two ways in which it is normally defined and the intuitions behind them.


1. A quick recap of language models

Skip if not needed

A language…

How to represent and manipulate data in graph form

Photo by Clint Adair

This is a summary of some of the key points from the paper “A Review of Relational Machine Learning for Knowledge Graphs (28 Sep 2015)” [1], which gives a nice introduction to Knowledge Graphs and some of the methods used to build and expand them.

The key takeaway

Information can be structured in the form of a graph, with nodes representing entities and edges representing relationships between entities. A knowledge graph can be built manually or using automatic information extraction methods on some source text (for example Wikipedia). …

Chiara Campagnola

Machine Learning student at UCL

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store