Skip to content

Latest commit

 

History

History
53 lines (37 loc) · 1.44 KB

README.md

File metadata and controls

53 lines (37 loc) · 1.44 KB

pyxclib

Tools for extreme multi-label classification problems.

git clone https://github.com/kunaldahiya/pyxclib.git
cd pyxclib
python3 setup.py install --user

Usage

Data reading/writing

from xclib.data import data_utils

# Read file with features and labels (old format from XMLRepo)
features, tabels, num_samples, num_features, num_labels = data_utils.read_data('train.txt')

# Read sparse file (see docstring for more)
# header can be set to false (if required)
labels = data_utils.read_sparse_file('trn_X_Xf.txt', header=True)

# Write sparse file (with header)
data_utils.write_sparse_file(labels, "labels.txt")

Evaluation

Implementation of precision, nDCG, propensity scored precision/nDCG and recall is included

from xclib.data import data_utils
import xclib.evaluation.xc_metrics as xc_metrics

# Read ground truth and predictions
true_labels = data_utils.read_sparse_file('tst_X_Y.txt')
predicted_labels = data_utils.read_sparse_file('parabel_predictions.txt')

# evaluate (See examples/evaluate.py for more details)
acc = xc_metrics.Metrics(true_labels=true_labels)
args = acc.eval(predicted_labels, 5)
print(xc_metrics.format(*args))

Tools

  • sparse/dense: topk, rank, binarize, sigmoid, normalize, etc.
  • dense: topk, binarize, sigmoid, normalize, etc.
  • shortlist: Shortlist, ShortlistCentroids, ShortlistInstances, etc.
  • analysis: compare_predictions, compare_nearest_neighbors, etc.