textz

https://github.com/aiez/textz

Example text-mining datasets: four software-engineering systematic-review "reading" corpora (Hall, Wahono, Radjenovic, Kitchenham). Each ships two ways — NAME.csv (processed feature table) and NAME_raw.csv (raw abstracts). Self-describing CSV headers; data only, no code. Feeds the textmine demos in ezr.

git clone https://github.com/aiez/konfig ../konfig
git clone https://github.com/aiez/textz textz && cd textz
make help

Sections: NAME | DATA | FILES | LICENSE | AUTHOR

NAME

textz - text-mining example datasets (SE systematic reviews)

DATA

Each review = two CSVs:
  NAME.csv      processed: one row per document, label column
                marks the relevant/included papers
  NAME_raw.csv  raw: document text for tokenize/stem/tf-idf

Used as:  ezr textmine ../textz/Hall.csv     (processed, CNB)
          ezr test_textmine                  (Hall + Hall_raw)

FILES

Hall        Wahono       Radjenovic       Kitchenham
(each as NAME.csv + NAME_raw.csv)

LICENSE

MIT. https://choosealicense.com/licenses/mit/

AUTHOR

Tim Menzies <timm@ieee.org>

built by gistsite