Releases: rasbt/mlxtend
Releases · rasbt/mlxtend
Version 0.17.1
New Features
- The
SequentialFeatureSelector
now supports using pre-specified feature sets via thefixed_features
parameter. (#578) - Adds a new
accuracy_score
function tomlxtend.evaluate
for computing basic classifcation accuracy, per-class accuracy, and average per-class accuracy. (#624 via Deepan Das) StackingClassifier
andStackingCVClassifier
now have adecision_function
method, which serves as a preferred choice overpredict_proba
in calculating roc_auc and average_precision scores when the meta estimator is a linear model or support vector classifier. (#634 via Qiang Gu)
Changes
- Improve the runtime performance for the
apriori
frequent itemset generating function whenlow_memory=True
. Settinglow_memory=False
(default) is still faster for small itemsets, butlow_memory=True
can be much faster for large itemsets and requires less memory. Also, input validation forapriori
, ̀ fpgrowthand
fpmaxtakes a significant amount of time when input pandas DataFrame is large; this is now dramatically reduced when input contains boolean values (and not zeros/ones), which is the case when using
TransactionEncoder`. (#619 via Denis Barbier) - Add support for newer sparse pandas DataFrame for frequent itemset algorithms. Also, input validation for
apriori
, ̀ fpgrowthand
fpmax` runs much faster on sparse DataFrame when input pandas DataFrame contains integer values. (#621 via Denis Barbier) - Let
fpgrowth
andfpmax
directly work on sparse DataFrame, they were previously converted into dense Numpy arrays. (#622 via Denis Barbier)
Bug Fixes
- Fixes a bug in
mlxtend.plotting.plot_pca_correlation_graph
that caused the explaind variances not summing up to 1. Also, improves the runtime performance of the correlation computation and adds a missing function argument for the explained variances (eigenvalues) if users provide their own principal components. (#593 via Gabriel Azevedo Ferreira) - Behavior of
fpgrowth
andapriori
consistent for edgecases such asmin_support=0
. (#573 via Steve Harenberg) fpmax
returns an empty data frame now instead of raising an error if the frequent itemset set is empty. (#573 via Steve Harenberg)- Fixes and issue in
mlxtend.plotting.plot_confusion_matrix
, where the font-color choice for medium-dark cells was not ideal and hard to read. #588 via sohrabtowfighi) - The
svd
mode ofmlxtend.feature_extraction.PrincipalComponentAnalysis
now also n-1 degrees of freedom instead of n d.o.f. when computing the eigenvalues to match the behavior ofeigen
. #595 - Disable input validation for
StackingCVClassifier
because it causes issues if pipelines are used as input. #606
Version 0.17.0
New Features
- Added an enhancement to the existing
iris_data()
such that both the UCI Repository version of the Iris dataset as well as the corrected, original
version of the dataset can be loaded, which has a slight difference in two data points (consistent with Fisher's paper; this is also the same as in R). (via #539 via janismdhanbad) - Added optional
groups
parameter toSequentialFeatureSelector
andExhaustiveFeatureSelector
fit()
methods for forwarding to sklearn CV (#537 via arc12) - Added a new
plot_pca_correlation_graph
function to themlxtend.plotting
submodule for plotting a PCA correlation graph. (#544 via Gabriel-Azevedo-Ferreira) - Added a
zoom_factor
parameter to themlxten.plotting.plot_decision_region
function that allows users to zoom in and out of the decision region plots. (#545) - Added a function
fpgrowth
that implements the FP-Growth algorithm for mining frequent itemsets as a drop-in replacement for the existingapriori
algorithm. (#550 via Steve Harenberg) - New
heatmap
function inmlxtend.plotting
. (#552) - Added a function
fpmax
that implements the FP-Max algorithm for mining maximal itemsets as a drop-in replacement for thefpgrowth
algorithm. (#553 via Steve Harenberg) - New
figsize
parameter for theplot_decision_regions
function inmlxtend.plotting
. (#555 via Mirza Hasanbasic) - New
low_memory
option for theapriori
frequent itemset generating function. Settinglow_memory=False
(default) uses a substantially optimized version of the algorithm that is 3-6x faster than the original implementation (low_memory=True
). (#567 via jmayse)
Changes
- Now uses the latest joblib library under the hood for multiprocessing instead of
sklearn.externals.joblib
. (#547) - Changes to
StackingCVClassifier
andStackingCVRegressor
such that first-level models are allowed to generate output of non-numeric type. (#562)
Bug Fixes
- Fixed documentation of
iris_data()
underiris.py
by adding a note about differences in the iris data in R and UCI machine learning repo. - Make sure that if the
'svd'
mode is used in PCA, the number of eigenvalues is the same as when using'eigen'
(append 0's zeros in that case) (#565)
Version 0.16.0
New Features
StackingCVClassifier
andStackingCVRegressor
now supportrandom_state
parameter, which, together withshuffle
, controls the randomness in the cv splitting. (#523 via Qiang Gu)StackingCVClassifier
andStackingCVRegressor
now have a newdrop_last_proba
parameter. It drops the last "probability" column in the feature set since ifTrue
,
because it is redundant: p(y_c) = 1 - p(y_1) + p(y_2) + ... + p(y_{c-1}). This can be useful for meta-classifiers that are sensitive to perfectly collinear features. (#532)- Other stacking estimators, including
StackingClassifier
,StackingCVClassifier
andStackingRegressor
, support grid search over theregressors
and even a single base regressor. (#522 via Qiang Gu) - Adds multiprocessing support to
StackingCVClassifier
. (#522 via Qiang Gu) - Adds multiprocessing support to
StackingCVRegressor
. (#512 via Qiang Gu) - Now, the
StackingCVRegressor
also enables grid search over theregressors
and even a single base regressor. When there are level-mixed parameters,GridSearchCV
will try to replace hyperparameters in a top-down order (see the documentation for examples details). (#515 via Qiang Gu) - Adds a
verbose
parameter toapriori
to show the current iteration number as well as the itemset size currently being sampled. (#519 - Adds an optional
class_name
parameter to the confusion matrix function to display class names on the axis as tick marks. (#487 via sandpiturtle)
Changes
- Due to new features, restructuring, and better scikit-learn support (for
GridSearchCV
, etc.) theStackingCVRegressor
's meta regressor is now being accessed via'meta_regressor__*
in the parameter grid. E.g., if aRandomForestRegressor
as meta- egressor was previously tuned via'randomforestregressor__n_estimators'
, this has now changed to'meta_regressor__n_estimators'
. (#515 via Qiang Gu) - The same change mentioned above is now applied to other stacking estimators, including
StackingClassifier
,StackingCVClassifier
andStackingRegressor
. (#522 via Qiang Gu)
Bug Fixes
- The
feature_selection.ColumnSelector
now also supports column names of typeint
(in addition tostr
names) if the input is a pandas DataFrame. (#500 via tetrar124 - Fix unreadable labels in
plot_confusion_matrix
for imbalanced datasets ifshow_absolute=True
andshow_normed=True
. (#504) - Raises a more informative error if a
SparseDataFrame
is passed toapriori
and the dataframe has integer column names that don't start with0
due to current limitations of theSparseDataFrame
implementation in pandas. (#503) - SequentialFeatureSelector now supports DataFrame as input for all operating modes (forward/backward/floating). #506
mlxtend.evaluate.feature_importance_permutation
now correctly accepts scoring functions with proper function signature asmetric
argument. #528
Version 0.15.0
New Features
- Adds a new transformer class to
mlxtend.image
,EyepadAlign
, that aligns face images based on the location of the eyes. (#466 by Vahid Mirjalili) - Adds a new function,
mlxtend.evaluate.bias_variance_decomp
that decomposes the loss of a regressor or classifier into bias and variance terms. (#470) - Adds a
whitening
parameter toPrincipalComponentAnalysis
, to optionally whiten the transformed data such that the features have unit variance. (#475)
Changes
- Changed the default solver in
PrincipalComponentAnalysis
to'svd'
instead of'eigen'
to improve numerical stability. (#474) - The
mlxtend.image.extract_face_landmarks
now returnsNone
if no facial landmarks were detected instead of an array of all zeros. (#466)
Bug Fixes
Version 0.14.0
New Features
- Added a
scatterplotmatrix
function to theplotting
module. (#437) - Added
sample_weight
option toStackingRegressor
,StackingClassifier
,StackingCVRegressor
,StackingCVClassifier
,EnsembleVoteClassifier
. (#438) - Added a
RandomHoldoutSplit
class to perform a random train/valid split without rotation inSequentialFeatureSelector
, scikit-learnGridSearchCV
etc. (#442) - Added a
PredefinedHoldoutSplit
class to perform a train/valid split, based on user-specified indices, without rotation inSequentialFeatureSelector
, scikit-learnGridSearchCV
etc. (#443) - Created a new
mlxtend.image
submodule for working on image processing-related tasks. (#457) - Added a new convenience function
extract_face_landmarks
based ondlib
tomlxtend.image
. (#458) - Added a
method='oob'
option to themlxtend.evaluate.bootstrap_point632_score
method to compute the classic out-of-bag bootstrap estimate (#459) - Added a
method='.632+'
option to themlxtend.evaluate.bootstrap_point632_score
method to compute the .632+ bootstrap estimate that addresses the optimism bias of the .632 bootstrap (#459) - Added a new
mlxtend.evaluate.ftest
function to perform an F-test for comparing the accuracies of two or more classification models. (#460) - Added a new
mlxtend.evaluate.combined_ftest_5x2cv
function to perform an combined 5x2cv F-Test for comparing the performance of two models. (#461) - Added a new
mlxtend.evaluate.difference_proportions
test for comparing two proportions (e.g., classifier accuracies) (#462)
Changes
- Addressed deprecations warnings in NumPy 0.15. (#425)
- Because of complications in PR (#459), Python 2.7 was now dropped; since official support for Python 2.7 by the Python Software Foundation is ending in approx. 12 months anyways, this re-focussing will hopefully free up some developer time with regard to not having to worry about backward compatibility
Bug Fixes
- Fixed an issue with a missing import in
mlxtend.plotting.plot_confusion_matrix
. (#428)
Version 0.13.0
Version 0.13.0 (07/20/2018)
New Features
- A meaningful error message is now raised when a cross-validation generator is used with
SequentialFeatureSelector
. (#377) - The
SequentialFeatureSelector
now accepts custom feature names via thefit
method for more interpretable feature subset reports. (#379) - The
SequentialFeatureSelector
is now also compatible with Pandas DataFrames and uses DataFrame column-names for more interpretable feature subset reports. (#379) ColumnSelector
now works with Pandas DataFrames columns. (#378 by Manuel Garrido)- The
ExhaustiveFeatureSelector
estimator inmlxtend.feature_selection
now is safely stoppable mid-process by control+c. (#380) - Two new functions,
vectorspace_orthonormalization
andvectorspace_dimensionality
were added tomlxtend.math
to use the Gram-Schmidt process to convert a set of linearly independent vectors into a set of orthonormal basis vectors, and to compute the dimensionality of a vectorspace, respectively. (#382) mlxtend.frequent_patterns.apriori
now supports pandasSparseDataFrame
s to generate frequent itemsets. (#404 via Daniel Morales)- The
plot_confusion_matrix
function now has the ability to show normalized confusion matrix coefficients in addition to or instead of absolute confusion matrix coefficients with or without a colorbar. The text display method has been changed so that the full range of the colormap is used. The default size is also now set based on the number of classes. - Added support for merging the meta features with the original input features in
StackingRegressor
(viause_features_in_secondary
) like it is already supported in the other Stacking classes. (#418) - Added a
support_only
to theassociation_rules
function, which allow constructing association rules (based on the support metric only) for cropped input DataFrames that don't contain a complete set of antecedent and consequent support values. (#421)
Changes
- Itemsets generated with
apriori
are nowfrozenset
s (#393 by William Laney and #394) - Now raises an error if a input DataFrame to
apriori
contains non 0, 1, True, False values. #419)
Bug Fixes
- Allow mlxtend estimators to be cloned via scikit-learn's
clone
function. (#374) - Fixes bug to allow the correct use of
refit=False
inStackingRegressor
andStackingCVRegressor
(#384 and (#385) by selay01) - Allow
StackingClassifier
to work with sparse matrices whenuse_features_in_secondary=True
(#408 by Floris Hoogenbook) - Allow
StackingCVRegressor
to work with sparse matrices whenuse_features_in_secondary=True
(#416) - Allow
StackingCVClassifier
to work with sparse matrices whenuse_features_in_secondary=True
(#417)
Version 0.12.0
Downloads
New Features
- A new
feature_importance_permuation
function to compute the feature importance in classifiers and regressors via the permutation importance method (#358) - The fit method of the
ExhaustiveFeatureSelector
now optionally accepts**fit_params
for the estimator that is used for the feature selection. (#354 by Zach Griffith) - The fit method of the
SequentialFeatureSelector
now optionally accepts
**fit_params
for the estimator that is used for the feature selection. (#350 by Zach Griffith)
Changes
- Replaced
plot_decision_regions
colors by a colorblind-friendly palette and adds contour lines for decision regions. (#348) - All stacking estimators now raise
NonFittedErrors
if any method for inference is called prior to fitting the estimator. (#353) - Renamed the
refit
parameter of both theStackingClassifier
andStackingCVClassifier
touse_clones
to be more explicit and less misleading. (#368)
Bug Fixes
- Various changes in the documentation and documentation tools to fix formatting issues (#363)
- Fixes a bug where the
StackingCVClassifier
's meta features were not stored in the original order whenshuffle=True
(#370) - Many documentation improvements, including links to the User Guides in the API docs (#371)
Version 0.11.0
New Features
- New function implementing the resampled paired t-test procedure (
paired_ttest_resampled
)
to compare the performance of two models
(also called k-hold-out paired t-test). (#323) - New function implementing the k-fold paired t-test procedure (
paired_ttest_kfold_cv
)
to compare the performance of two models
(also called k-hold-out paired t-test). (#324) - New function implementing the 5x2cv paired t-test procedure (
paired_ttest_5x2cv
) proposed by Dieterrich (1998)
to compare the performance of two models. (#325) - A
refit
parameter was added to stacking classes (similar to therefit
parameter in theEnsembleVoteClassifier
), to support classifiers and regressors that follow the scikit-learn API but are not compatible with scikit-learn'sclone
function. (#325) - The
ColumnSelector
now has adrop_axis
argument to use it in pipelines withCountVectorizers
. (#333)
Changes
- Raises an informative error message if
predict
orpredict_meta_features
is called prior to calling thefit
method inStackingRegressor
andStackingCVRegressor
. (#315) - The
plot_decision_regions
function now automatically determines the optimal setting based on the feature dimensions and supports anti-aliasing. The oldres
parameter has been deprecated. (#309 by Guillaume Poirier-Morency) - Apriori code is faster due to optimization in
onehot transformation
and the amount of candidates generated by theapriori
algorithm. (#327 by Jakub Smid) - The
OnehotTransactions
class (which is typically often used in combination with theapriori
function for association rule mining) is now more memory efficient as it uses boolean arrays instead of integer arrays. In addition, theOnehotTransactions
class can be now be provided withsparse
argument to generate sparse representations of theonehot
matrix to further improve memory efficiency. (#328 by Jakub Smid) - The
OneHotTransactions
has been deprecated and replaced by theTransactionEncoder
. (#332 - The
plot_decision_regions
function now has three new parameters,scatter_kwargs
,contourf_kwargs
, andscatter_highlight_kwargs
, that can be used to modify the plotting style. (#342 by James Bourbeau)
Bug Fixes
- Fixed issue when class labels were provided to the
EnsembleVoteClassifier
whenrefit
was set tofalse
. (#322) - Allow arrays with 16-bit and 32-bit precision in
plot_decision_regions
function. (#337) - Fixed bug that raised an indexing error if the number of items was <= 1 when computing association rules using the conviction metric. (#340)
Version 0.10.0
New Features
- New
store_train_meta_features
parameter forfit
in StackingCVRegressor. if True, train meta-features are stored inself.train_meta_features_
.
Newpred_meta_features
method forStackingCVRegressor
. People can get test meta-features using this method. (#294 via takashioya) - The new
store_train_meta_features
attribute andpred_meta_features
method for theStackingCVRegressor
were also added to theStackingRegressor
,StackingClassifier
, andStackingCVClassifier
(#299 & #300) - New function (
evaluate.mcnemar_tables
) for creating multiple 2x2 contigency from model predictions arrays that can be used in multiple McNemar (post-hoc) tests or Cochran's Q or F tests, etc. (#307) - New function (
evaluate.cochrans_q
) for performing Cochran's Q test to compare the accuracy of multiple classifiers. (#310)
Changes
- Added
requirements.txt
tosetup.py
. (#304 via Colin Carrol)
Bug Fixes
Version 0.9.1
Version 0.9.1 (2017-11-19)
Downloads
New Features
- Added
mlxtend.evaluate.bootstrap_point632_score
to evaluate the performance of estimators using the .632 bootstrap. (#283) - New
max_len
parameter for the frequent itemset generation via theapriori
function to allow for early stopping. (#270)
Changes
- All feature index tuples in
SequentialFeatureSelector
or now in sorted order. (#262) - The
SequentialFeatureSelector
now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994).
Note that this didn't cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases.
(#262) utils.Counter
now accepts a name variable to help distinguish between multiple counters, time precision can be set with the 'precision' kwarg and the new attribute end_time holds the time the last iteration completed. (#278 via Mathew Savage)
Bug Fixes
- Fixed an deprecation error that occured with McNemar test when using SciPy 1.0. (#283)