Post-Transformer Hackathon · IIT Ropar × Pathway

BDH Sparse Brain
Visualizer

Interactive exploration of what makes the Dragon Hatchling architecturally different — sparse activations, Hebbian memory, and scale-free graph topology.

Activation Sparsity: BDH vs Transformer

Feed the same token to both architectures. BDH fires ~5% of neurons. Transformers fire nearly all. Same input — dramatically different neural behavior.

INPUT TOKEN
NEURON GRID SIZE
12×12
ANIMATE
BDH · Dragon Hatchling
Sparse Architecture
ReLU-lowrank activations — only essential neurons fire
Active neurons ~5%
Memory (context) O(n×d) — constant
Attention O(T) linear
Interpretable ✓ Monosemantic
Transformer · GPT-style
Dense Architecture
SoftMax attention — nearly all neurons activate for every token
Active neurons ~95%
Memory (context) KV-cache grows ∞
Attention O(T²) quadratic
Interpretable ✗ Polysemantic
🔬 ARCHITECTURAL INSIGHT
"London" triggers currency and country synapses in BDH — only neurons encoding geographic/economic concepts fire. In a transformer, the entire weight matrix participates. This isn't pruning or regularization: sparsity is built into BDH's ReLU-lowrank design. You can point to a synapse and explain exactly what it encodes.

Hebbian Memory: "Neurons that fire together, wire together"

Watch synaptic connections strengthen in real time as BDH processes tokens. This happens during inference — no backpropagation required. Transformers cannot do this.

Strong synapse (learned) Weak synapse Currently firing Silent neuron
💡 WHY THIS MATTERS
BDH's σ matrix (synaptic state) has constant size regardless of how many tokens you process. A transformer's KV-cache grows linearly — eventually exhausting GPU memory. BDH has been demonstrated running at 50,000+ tokens with flat memory usage while transformers crash at ~12k on the same hardware.

Scale-Free Graph Topology: BDH's Neural Architecture

BDH organizes as a scale-free hub-and-spoke graph — a few highly connected hub neurons dominate, like in real brains and the internet. Click a node to highlight its connections.

📐 SCALE-FREE PROPERTY
Hub neurons (large, bright) connect to many others — like airports in a travel network. This structure emerges spontaneously during training, not by design. It's the same topology found in biological brains, the web, and protein interaction networks. Where a transformer is a dense matrix, BDH becomes this interpretable graph.

Architecture Comparison

What changes at the fundamental level — from dense matrix paradigm to sparse graph dynamics.

PROPERTY TRANSFORMER BDH (Dragon Hatchling)
Structure Dense matrix layers (every weight × every input) + Scale-free graph of neurons — sparse, brain-like
Activation ~95–100% neurons fire per token + ~5% neurons fire — sparse by architecture
Memory KV-cache grows with every token → OOM at long context + Hebbian synaptic state — O(n×d), constant size forever
Attention O(T²) — quadratic explosion with sequence length + O(T) — linear attention, scales to infinite context
Learning Frozen at inference — must retrain to learn new facts + Hebbian updates during inference — learns while running
Interpretability Polysemantic: one neuron encodes multiple concepts (black box) + Monosemantic: "currency synapses", "country synapses" — point and explain
Model merging Requires careful fine-tuning, often fails + Concatenate independently trained models — composable by design
Context at 50k tokens Crashes on ~12k tokens (T4 GPU) + Flat memory usage — demonstrated at 50k+ tokens
Graph visualizable No — abstract matrix operations + Yes — G_x = E @ D_x renders as literal force graph
BDH KEY PAPER FINDINGS
  • #1 paper on HuggingFace month of release
  • #2 paper of all 2025 despite Oct release
  • Competitive with GPT-2 at language modeling
  • 79.54% CIFAR-10 vs ViT-Tiny 74% (3.2M vs 5.7M params)
  • Financial: 14M BDH beats 67M DistilBERT on sentiment
REAL-WORLD ADOPTERS