Activation Sparsity: BDH vs Transformer
Feed the same token to both architectures. BDH fires ~5% of neurons. Transformers fire nearly all. Same input — dramatically different neural behavior.
Hebbian Memory: "Neurons that fire together, wire together"
Watch synaptic connections strengthen in real time as BDH processes tokens. This happens during inference — no backpropagation required. Transformers cannot do this.
Scale-Free Graph Topology: BDH's Neural Architecture
BDH organizes as a scale-free hub-and-spoke graph — a few highly connected hub neurons dominate, like in real brains and the internet. Click a node to highlight its connections.
Architecture Comparison
What changes at the fundamental level — from dense matrix paradigm to sparse graph dynamics.
- #1 paper on HuggingFace month of release
- #2 paper of all 2025 despite Oct release
- Competitive with GPT-2 at language modeling
- 79.54% CIFAR-10 vs ViT-Tiny 74% (3.2M vs 5.7M params)
- Financial: 14M BDH beats 67M DistilBERT on sentiment
- 🏎 Formula 1 racing teams
- 🛡 NATO
- 📮 La Poste
- → Read the paper
- → GitHub repo