Weka Packages

You can install the following packages from the Weka (developer version) Package Manager.

Incremental Wrapper Subset Selection

This attribute selector is specially designed to handle high-dimensional datasets. It first creates a ranking of attributes based on the selected metric, and then it runs an Incremental Wrapper Subset Selection over the ranking (linear complexity) by selecting attributes (using the WrapperSubsetEval class) which improve the performance for a given minimum number of folds out of the folds of the the wrapper cross-validation. It contains the theta option which permits to tune an early stopping (sublinear complexity).It contains the replaceSelection option, which tests at each step of the incremental search swapping a selected attribute by the current candidate, this reduces the mean number of selected attributes without decreasing performance but it increases the linear complexity to quadratic. See “Pablo Bermejo, Jose A. Gamez, Jose M. Puerta (2011). Improving Incremental Wrapper-Based Subset Selection via Replacement and Early Stopping. International Journal of Pattern Recognition and Artificial Intelligence. 25(5):605-625.”.

Reranking Search

Meta-Search algorithm. It first creates an univariate ranking of all attributes in decreasing order given an information-theory-based AttributeEvaluator; then, the ranking is split in blocks of size B, and a ASSearch is run for the first block. Given the selected attributes, the rest of the ranking is re-ranked based on conditional IG of each attribute given the selected attributes so far. Then ASSearch is run again on the first current block, and so on. Search stops when no attribute is selected in current block. For more information, see “Pablo Bermejo et. al. Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking. Knowledge-Based Systems. Volume 25 Issue 1. February 2012″.

Distribution Based Balance

Re-samples with replacement instances tagged with class labels specifiedby the user. Sampling of instances is performed following a distribution learned for each pair <attribute,class label>. For more information, see “Pablo Bermejo et al. (2011). Improving the performance of Naive Bayes Multinomial in e-mail foldering by introducing distribution-based balance of datasets. Expert Systems With Applications. 38-3: 2072-2080.

Information Gain evaluator

Computes IG between a numerical attribute X and a multinomial class C without performing any discretization. see: “Supervised classification with conditional Gaussian networks: Increasing the structure complexity from Naive Bayes. Aritz Perez, Pedro Larrañaga, Iñaki Inza.”