http://tiny.cc/textz
Example text-mining datasets: four software-engineering
systematic-review "reading" corpora (Hall, Wahono, Radjenovic,
Kitchenham). Each ships two ways — NAME.csv (processed
feature table) and NAME_raw.csv (raw abstracts).
Self-describing CSV headers; data only, no code. Feeds the textmine
demos in ezr.
git clone http://tiny.cc/konfig ../konfig
git clone http://tiny.cc/textz textz && cd textz
make helpSections: NAME | DATA | FILES | LICENSE | AUTHOR
NAME
textz - text-mining example datasets (SE systematic reviews)
DATA
Each review = two CSVs:
NAME.csv processed: one row per document, label column
marks the relevant/included papers
NAME_raw.csv raw: document text for tokenize/stem/tf-idf
Used as: ezr textmine ../textz/Hall.csv (processed, CNB)
ezr test_textmine (Hall + Hall_raw)
FILES
Hall Wahono Radjenovic Kitchenham
(each as NAME.csv + NAME_raw.csv)
LICENSE
MIT. https://choosealicense.com/licenses/mit/
AUTHOR
Tim Menzies <timm@ieee.org>
built by gistsite