Discovering Mechanisms in Tokenized Graph Transformers

Interactive demos accompanying the paper. (Draft — narrative text to follow.)

Overview

Placeholder. This post collects five interactive figures that probe the mechanisms our tokenized graph transformers learn on degree counting, ring membership, and shortest-path distance. Full narrative to be written.

All figures are interactive and run entirely in the browser from pre-exported, model-backed results, with no backend.

Attention & identifier matching

Placeholder. Early attention retrieves incident edge tokens by matching node-identifier channels.

Degree-direction steering

Placeholder. On a graph the clean model labels perfectly, steering along the learned degree direction flips individual node predictions as the strength increases.

Ring membership L1 ablation

Placeholder. Ablating layer-1 attention output leaves ring-node predictions intact but collapses non-ring predictions — a class-imbalanced failure.

Shortest-path routing

Placeholder. Head L2:H2 concentrates on each node’s BFS parent toward the root, copying distance one hop at a time.

QK identifier matrices

Placeholder. Query–key weights implement identifier-equality tests.