AuthorLanguageLicensePurpose

http://tiny.cc/fairnez

Standard fairness-benchmark CSVs in fft column-suffix format. Adult, German, Bank, COMPAS. Each has a sensitive attribute (race / sex / age) and a binary klass. Use with fft.

# install and test
git clone http://tiny.cc/fairnez && git clone http://tiny.cc/fft fft
cd fft && python3 -B eval.py -f ../fairnez/compas.csv

Sections: NAME | SYNOPSIS | DATA | DATASETS | NOTES | RECREATE | REFERENCES | SEE ALSO | LICENSE | AUTHOR

Files: adult.csv | bank.csv | communities.csv | compas.csv | german.csv | law.csv | convert.py | protect.py | labels.py

NAME

fairnez - fairness benchmark datasets, fft-CSV format

SYNOPSIS

python3 -B eval.py -f fairnez/<dataset>.csv

DATA

Column-name suffix protocol (see ../konfig/style_gist.md):
  Cap...    numeric (Num)
  lower...  symbolic (Sym)
  !         klass (binary target)
  X         ignored
  missing   '?'

DATASETS

file          rows    cols  klass!         sensitive    source
-----------   -----   ----  ------------   -----------  ------
adult.csv     30162    15   income         sex, race    UCI [6]
german.csv     1000    21   credit         sex_marital  UCI [1]
bank.csv      45211    17   y              age          UCI [2]
compas.csv     6479    10   two_year_recid sex, race    Propublica [4,5]
meps.csv         --    --   --             race         AHRQ [3] -- gated

NOTES

adult.csv     UCI Adult / Census Income. "?" rows dropped (30162
              of 32561). Target income! in {<=50K, >50K}. Suggested
              sensitive: sex (Male/Female), race.

german.csv    UCI Statlog German Credit. 1000 rows, attribute codes
              A11..A201 unchanged. Target credit! in {good, bad}
              (recoded from 1/2). Sensitive embedded in sex_marital
              (A91..A95).

bank.csv      UCI Bank Marketing (bank-full). Target y! in {yes,no}
              = subscribed to term deposit. Sensitive: age (>=25
              bucket commonly used).

compas.csv    ProPublica two-year recidivism. Filtered per
              ProPublica's standard rules: days_b_screening_arrest
              in [-30,30], is_recid != -1, charge_degree != "O",
              score_text != "N/A". Target two_year_recid! in {0,1}.
              Sensitive: race (African-American vs others), sex.

meps.csv      MEPS HC-181 panel. NOT included: AHRQ requires a
              click-through usage agreement. Download manually from
              https://meps.ahrq.gov/mepsweb/data_stats/download_data_files_detail.jsp?cboPufNumber=HC-181
              then convert. AIF360 ships a preprocessing script.

RECREATE

python3 -B convert.py    # re-runs raw/ -> *.csv

Raw sources are in raw/ (UCI archives, ProPublica GitHub). MIT
license applies to *this packaging*; original datasets keep their
own terms (UCI = CC BY 4.0; ProPublica = open).

REFERENCES

[1] 1994. The German Credit dataset.
    https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)
[2] 2014. The Bank dataset.
    https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
[3] 2015. The MEPS dataset.
    https://meps.ahrq.gov/mepsweb/data_stats/download_data_files_detail.jsp?cboPufNumber=HC-181
[4] 2016. The COMPAS dataset.
    https://github.com/propublica/compas-analysis
[5] 2016. Machine bias.
    https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
[6] 2017. The Adult Census Income dataset.
    https://archive.ics.uci.edu/ml/datasets/adult

SEE ALSO

http://tiny.cc/fft      trees + ensembles using these CSVs
http://tiny.cc/optimiz  optimization CSVs (auto93, config_SS-N, ...)
http://tiny.cc/konfig   shared Makefile, bashrc, nvim, tmux

LICENSE

MIT.  https://choosealicense.com/licenses/mit/
(c) 2025 Tim Menzies.

AUTHOR

Tim Menzies <timm@ieee.org>

built by gistsite