http://tiny.cc/ezr
ezr — explainable multi-objective optimization. Two files, ~1100 lines, zero dependencies, pure Python stdlib. An experiment in "how low can you go?": active learning labels a few dozen informative rows, builds a regression tree, and sorts the rest. Repeated studies show that labelling just the first ~5 examples optimizes as well or better than SMAC — at two orders of magnitude less cost.
# sibling data gists supply the CSVs (no data lives in here)
git clone http://tiny.cc/optimiz # optimization data
git clone http://tiny.cc/klassif # classification data
git clone http://tiny.cc/ezr && cd ezr
python3 cli.py --list # all commands
python3 cli.py --tree ../optimiz/auto93.csv
python3 cli.py --all # run every self-test
Sections: NAME | SYNOPSIS | DESCRIPTION | DATA | COMMANDS | OPTIONS | LAYOUT | LICENSE | AUTHOR
NAME
ezr - explainable multi-objective optimization via decision
trees, clustering, naive bayes, and active learning
SYNOPSIS
python3 cli.py [--key=val ...] --<name> [FILE]
python3 cli.py --list | --fast | --slow | --all | --help
p # konfig bashrc alias: python3 -B cli.py
Sibling gists (one parent dir; no naked paths):
ezr/ this repo (ezr.py library + cli.py dispatch)
optimiz/ optimization CSVs (tiny.cc/optimiz)
klassif/ classification CSVs (tiny.cc/klassif)
textz/ text-mining CSVs (tiny.cc/textz)
konfig/ shared Makefile + dotfiles (make help|sh|vi|...)
DESCRIPTION
Summarizes CSV into Num/Sym columns; grows decision trees that
minimize distance to the ideal outcome; clusters via k-means or
recursive halving; classifies + actively learns with naive bayes
or centroid acquisition. Input is CSV; the header defines roles
(see DATA). Stdlib only, Python 3.12+.
DATA
Header column names declare each role:
[A-Z]* numeric (e.g. "Age")
[a-z]* symbolic (e.g. "job")
[A-Z]*+ maximize goal (e.g. "Mpg+")
[A-Z]*- minimize goal (e.g. "Lbs-")
[a-z]*! class label (e.g. "sick!")
*X ignored (e.g. "idX")
? missing value (in rows, not the header)
COMMANDS
each `test_<name>` in cli.py is one command (demo + self-check),
run via `--<name>`. No FILE -> default dataset; FILE -> that CSV.
--core primitives: Num/Sym/Data/distance/format
--tree grow + show a regression tree, check plans
--cluster k-means++ / k-means / recursive halving
--classify naive bayes beats ZeroR (needs ../klassif)
--search sa | ls | de optimizers (energy trace)
--acquire active learning beats random (20 reps)
--acquire20 hold-out tree win (acquire half, sort the other)
--textmine CNB + tf-idf text mining (needs ../textz)
--stats same / bestRanks / confused
lanes: --fast (skip slow) | --slow (textmine) | --all
OPTIONS
--seed=1 random seed
--p=2 distance (1,2 = Manhattan, Euclid)
--few=128 max rows kept while sampling
--learn.leaf=3 examples per tree leaf
--learn.start=4 initial labels
--learn.budget=50 rows allowed to be labelled
--learn.check=5 guesses to check
--bayes.m=2 m-estimate --bayes.k=1 laplace
(full list: head of ezr.py; override any as --key=val)
LAYOUT
ezr.py library; section banners per app (Types, Col, Data,
Distance, Bayes, Tree, Cluster, Classify, Search,
Acquire, Textmine, Stats, Format)
cli.py dispatch; one test_<name> per concept (demo + assert),
run via --<name>; --fast/--slow/--all lanes
LICENSE
MIT. https://choosealicense.com/licenses/mit/
AUTHOR
Tim Menzies <timm@ieee.org>
150 words of css
designed.2.last