Spark
A novel neural network built from scratch in C++ and Metal. No frameworks, no PyTorch. Trained on a single laptop.
Key Results
2.92
Val BPC
20 min
Wall Time
423K
Parameters
M1 Pro
Hardware
Timeline
Feb 16, 2026
Project started
First commit. C++ engine, Metal compute shaders, custom memory allocator.
Apr 9, 2026
First published results: 2.92 val BPC392 training runs, 207 merged PRs, 53 days of development.
April 2026 Results
Apr 9, 2026
Setup
Task
Character-level language modeling
Dataset
WikiText-103 (raw character stream)
Hardware
Apple M1 Pro, 16 GB unified memory
Architecture
Novel architecture (no attention, no convolution)
Results
Spark is a neural network engine written from scratch in C++ with Metal compute shaders for GPU acceleration on Apple Silicon. It implements a novel architecture where the network topology itself is a learned artifact: the structure evolves during training rather than being fixed upfront.
There are no attention layers, no convolution, no skip connections borrowed from existing architectures. Every component, from the memory allocator to the GPU kernels, was built from scratch in C++ and Metal by a solo researcher over 53 days of nights-and-weekends work. No frameworks were used.
On the WikiText-103 character-level language modeling benchmark, the best run reached 2.92 validation bits-per-character (BPC) in under 20 minutes of wall time, with the network growing to 423K parameters across 2,841 hidden units and a depth of 5 layers. The entire training run executed on a single M1 Pro laptop.
Over the course of the project, 392 training runs were executed and 207 pull requests were merged.
Learning Curve
| Wall Time | Train BPC | Val BPC |
|---|---|---|
| 0s | 6.64 | 6.62 |
| 1m 12s | 4.19 | 4.22 |
| 3m 02s | 3.71 | 3.74 |
| 5m 18s | 3.48 | 3.51 |
| 7m 45s | 3.32 | 3.36 |
| 10m 10s | 3.19 | 3.23 |
| 12m 38s | 3.10 | 3.14 |
| 15m 05s | 3.02 | 3.06 |
| 17m 30s | 2.95 | 2.99 |
| 19m 53s | 2.89 | 2.93 |
Best validation BPC: 2.92 (measured at final evaluation checkpoint)
Final Network Statistics
225,749
Edges
2,841
Hidden Units
5
Depth (Layers)
423K
Parameters
Articles
Sub-500K Parameters on WikiText-103 Char-Level
What can 423K parameters achieve? 2.92 BPC in 20 minutes on a single M1 Pro laptop.
Why Atomic-Free Sparse Backward Passes Are Slower on Metal
Eliminating atomics sounds like a win. On Apple Metal, it's a 2.8x slowdown.
Metal Compute Shader Patterns for Sparse ML Training
Zero academic literature exists on Metal ML kernels. Here's what we learned building one from scratch.