Interactive Positional Encoding

Visualizing Positional Encoding

An interactive guide to understanding how Transformers encode position.

The Core Equations

PE(pos, 2i) = sin(pos / 100002i/d)
PE(pos, 2i+1) = cos(pos / 100002i/d)

These equations generate a unique vector for each position (pos) in a sequence, using sine and cosine waves of varying frequencies determined by the dimension index (i).

Positional Encoding Vector

This chart shows the final encoding vector for a given position. Drag the sliders to see how it changes.

Underlying Sine & Cosine Waves

Each pair of dimensions comes from a sine/cosine wave pair, and their wavelengths are mentioned below. The vertical line shows the current position.

Note: Rendering may be slow with high dimension values.

Full Positional Encoding Matrix

This heatmap shows the entire PE matrix. The highlighted row corresponds to the selected position.

Position (pos)

Dimension Index (d)

How do these encodings help differentiate positions?

Absolute Position: The unique mix of frequencies ensures that every position in the sequence is assigned a distinct vector, as seen in the comparison charts below.

Relative Position: The encoding for pos+k is a linear transformation (a rotation) of the encoding for pos. This means the relationship between positions is consistent. Notice how the Cosine Similarity for any two positions with the same offset (e.g., 7 to 8 vs. 22 to 23) is identical.

Note: The sine and cosine functions have values in [-1, 1], which keeps the values of the positional encoding matrix in a normalized range.

You can prove this yourself!

The fact that PE(pos+k) is a linear transformation of PE(pos) comes from the angle addition formulas and can be expressed as a rotation matrix where Ti = 100002i/d:

[ p2i, pos+k, p2i+1, pos+k ] = [ p2i, pos, p2i+1, pos ] ·
[
cos(k/Ti)-sin(k/Ti) sin(k/Ti)cos(k/Ti)
]

Cosine Similarity

0.000

Euclidean Distance

0.000

Vector for Position 7

Vector for Position 8

Frequency & Wavelength Analysis

The core of this encoding lies in using waves of different frequencies. The term 1 / 100002i/d controls the frequency. As the dimension index i increases, the frequency decreases, and the wavelength (λ) increases according to the formula: λi = 2π · 100002i/d.

Select dimension indices (i) below to plot their corresponding sine waves.