Before you buy a Ferrari to drive to the grocery store, try walking.

Tim Menzies & Srinath Srinivasan · June 2026

We built EZR.py: a 400-line Python toolkit, stdlib only, under 1MB to install. On 120+ tabular SE optimization tasks it matches or beats SMAC3, SHAP, LIME, FASTREAD — while running 500× faster on under 100 labels.

How? Read the code. Strip the redundancy. Many "different" algorithms — classification, clustering, optimization, text mining — collapse to the same four classes: Num, Sym, Cols, Data. One-line change flips a decision tree from numeric to symbolic prediction. 1983's Simulated Annealing still beats modern Local Search variants. Naive Bayes in 30 lines beats SVM on text.

Caveat. Scope is tabular SE problems. Generative tasks (LLMs, images) are TBD — complexity may earn its keep there. But the text-mining result (30-line Naive Bayes beating SVM on FASTREAD) hints the EZR approach may extend past tables into text.

Six Myths

  1. Always need heavy infra. No — stdlib.
  2. Each task needs its own algo. No — same 4 classes.
  3. Trees differ by prediction type. No — 1-line flip.
  4. Newer always beats older. No — SA'83 wins.
  5. Always need massive data. No — 100 labels = 85-95% optimal.
  6. Text always needs advanced models. No — 30-line NB beats SVM.

By the Numbers

vs. SMAC3500× faster
labels to optimum< 100
features used< 10
code size400 lines
install size< 1 MB
tasks tested120+
If a simple model matches a complex one,
the complex one is technical debt.

Developers still need to read code. At least for tabular optimization.


 Read the paper (arXiv:2606.03640)


NC State ©2026, timm, MIT License
150 words of css
designed.2.last