Base class of parameter tuning¶

This class is used for inheritance only, So you shouldn’t use this class directly.

Tuning class of each ML estimator inherits this class.

Note

If you want to see default arguments of each tuning class, See the following link

Classification
- LGBMClassifierTuning
https://c60evaporator.github.io/tune-easy/each_estimators.html#tune_easy.lgbm_tuning.LGBMClassifierTuning
- LogisticRegressionTuning
https://c60evaporator.github.io/tune-easy/each_estimators.html#tune_easy.logisticregression_tuning.LogisticRegressionTuning
- RFClassifierTuning
https://c60evaporator.github.io/tune-easy/each_estimators.html#tune_easy.rf_tuning.RFClassifierTuning
- SVMClassifierTuning
https://c60evaporator.github.io/tune-easy/each_estimators.html#tune_easy.svm_tuning.SVMClassifierTuning
- XGBClassifierTuning
https://c60evaporator.github.io/tune-easy/each_estimators.html#tune_easy.xgb_tuning.XGBClassifierTuning
Regression
- ElasticNetTuning
https://c60evaporator.github.io/tune-easy/each_estimators.html#tune_easy.elasticnet_tuning.ElasticNetTuning
- LGBMRegressorTuning
https://c60evaporator.github.io/tune-easy/each_estimators.html#tune_easy.lgbm_tuning.LGBMRegressorTuning
- RFRegressorTuning
https://c60evaporator.github.io/tune-easy/each_estimators.html#tune_easy.rf_tuning.RFRegressorTuning
- SVMRegressorTuning
https://c60evaporator.github.io/tune-easy/each_estimators.html#tune_easy.svm_tuning.SVMRegressorTuning
- XGBRegressorTuning
https://c60evaporator.github.io/tune-easy/each_estimators.html#tune_easy.xgb_tuning.XGBRegressorTuning
- LinearRegression - No optimization, only display
https://c60evaporator.github.io/tune-easy/each_estimators.html#tune_easy.linearregression_tuning.LinearRegressionTuning

tune_easy.param_tuning module¶

class tune_easy.param_tuning.ParamTuning(X, y, x_colnames, y_colname=None, cv_group=None, eval_set_selection=None, **kwargs)¶

Bases: object

Base class of tuning classes

This class is used for inheritance only, So you shouldn’t use this class directly.

bayes_opt_tuning(estimator=None, tuning_params=None, cv=None, seed=None, scoring=None, n_iter=None, init_points=None, acq=None, not_opt_params=None, int_params=None, param_scales=None, mlflow_logging=None, mlflow_tracking_uri=None, mlflow_artifact_location=None, mlflow_experiment_name=None, fit_params=None)¶

Run bayesian optimization using BayesianOptimization library.

Parameters

estimator (estimator object implementing fit, default=None) –
Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.

Note that the parameters of the estimator are overridden by not_opt_params

If None, ESTIMATOR written in each tuning class is used.
tuning_params (dict[str, tuple(float, float)], default=None) –
Dictionary with parameters names (str) as keys and tuples of or minimum limit and maximum limit of parameters as value.

If None, BAYES_PARAMS written in each tuning class is used.
cv (int, cross-validation generator, or an iterable, default=5) –
Determines the cross-validation splitting strategy.

If int, to specify the number of folds in a KFold.
seed (int, default=42) –
Seed for random number generator of estimator and cross validation.

Note that “random_state” in not_opt_params are overridden by this argument.
scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.
n_iter (int, default=None) –
Number of iterations in bayesian optimization.

If None, N_ITER_BAYES written in each tuning class is used.
init_points (int, default=None) –
Number of initialized points, which searched randomly.

If None, INIT_POINTS written in each tuning class is used.
acq ({'ei', 'pi', 'ucb'}, default='ei') – Acquisition function
not_opt_params (dict, default=None) –
Dictionary with parameters, which are NOT optimized.

Note that the parameters override those of the estimator.

If None, NOT_OPT_PARAMS written in each tuning class is used.
int_params (list[str], default=None) –
List of parameters whose type is int.

If None, INT_PARAMS written in each tuning class is used.
param_scales (dict(str, {'linear', 'log'}), default=None) –
Dictionary with parameters’ scales.

If ‘linear’, the axis of result graph is drawn in linear scale.

If ‘log’, the axis of result graph is drawn in log scale.

If None, PARAM_SCALES written in each tuning class is used.
mlflow_logging (str, default=None) –
Strategy to record the result by MLflow library.

If ‘inside’, mlflow process is started in the tuning instance. So you need not use start_run() explicitly.

If ‘outside’, mlflow process is NOT started in the tuning instance. So you should use start_run() outside the tune-easy library.

If None, mlflow is not used.
mlflow_tracking_uri (str, default=None) –
Tracking uri for MLflow. This argument is passed to tracking_uri in mlflow.set_tracking_uri()

See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri
mlflow_artifact_location (str, default=None) –
Artifact store for MLflow. This argument is passed to artifact_location in mlflow.create_experiment()

See https://mlflow.org/docs/latest/tracking.html#artifact-stores
mlflow_experiment_name (str, default=None) –
Experiment name for MLflow. This argument is passed to name in mlflow.create_experiment()

See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment
fit_params (dict, default=None) –
Parameters passed to the fit() method of the estimator, e.g. early_stopping_round and eval_set of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

If None, FIT_PARAMS written in each tuning class is used.

Note that if eval_set is None, self.X and self.y are set to eval_set automatically.

Returns

best_params (dict[str, float]) – Returns best parameters determined by optimization
best_score (float) – Returns best score determined by optimization.

get_feature_importances()¶

Get feature importances of best estimater. Available only if self.estimator is RandomForest, LightGBM, or XGBoost

Returns: df_importance – Returns feature importances of best estimater as pandas.DataFrame
Return type: pandas.DataFrame

get_search_history()¶

Get high score history of optimization as pandas.DataFrame

Returns: df_history – Returns high score history of optimization as pandas.DataFrame
Return type: pandas.DataFrame

grid_search_tuning(estimator=None, tuning_params=None, cv=None, seed=None, scoring=None, not_opt_params=None, param_scales=None, mlflow_logging=None, mlflow_tracking_uri=None, mlflow_artifact_location=None, mlflow_experiment_name=None, grid_kws=None, fit_params=None)¶

Run grid search optimization.

Parameters

estimator (estimator object implementing fit, default=None) –
Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.

Note that the parameters of the estimator are overridden by not_opt_params

If None, ESTIMATOR written in each tuning class is used.
tuning_params (dict[str, list(float)], default=None) –
Dictionary with parameters names (str) as keys and lists of parameter settings to try as values.

If None, CV_PARAMS_GRID written in each tuning class is used.
cv (int, cross-validation generator, or an iterable, default=5) –
Determines the cross-validation splitting strategy.

If int, to specify the number of folds in a KFold.
seed (int, default=42) –
Seed for random number generator of estimator and cross validation.

Note that “random_state” in not_opt_params are overridden by this argument.
scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.
not_opt_params (dict, default=None) –
Dictionary with parameters, which are NOT optimized.

Note that the parameters override those of the estimator.

If None, NOT_OPT_PARAMS written in each tuning class is used.
param_scales (dict[str, {'linear', 'log'}], default=None) –
Dictionary with parameters’ scales.

If ‘linear’, the axis of result graph is drawn in linear scale.

If ‘log’, the axis of result graph is drawn in log scale.

If None, PARAM_SCALES written in each tuning class is used.
mlflow_logging (str, default=None) –
Strategy to record the result by MLflow library.

If ‘inside’, mlflow process is started in the tuning instance. So you need not use start_run() explicitly.

If ‘outside’, mlflow process is NOT started in the tuning instance. So you should use start_run() outside the tune-easy library.

If None, mlflow is not used.
mlflow_tracking_uri (str, default=None) –
Tracking uri for MLflow. This argument is passed to tracking_uri in mlflow.set_tracking_uri()

See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri
mlflow_artifact_location (str, default=None) –
Artifact store for MLflow. This argument is passed to artifact_location in mlflow.create_experiment()

See https://mlflow.org/docs/latest/tracking.html#artifact-stores
mlflow_experiment_name (str, default=None) –
Experiment name for MLflow. This argument is passed to name in mlflow.create_experiment()

See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment
grid_kws (dict, default=None) –
Additional parameters passed to sklearn.model_selection.GridSearchCV, e.g. n_jobs.

Note that estimator, param_grid, cv, and scoring CAN NOT be used in the argument

See https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
fit_params (dict, default=None) –
Parameters passed to the fit() method of the estimator, e.g. early_stopping_round and eval_set of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

If None, FIT_PARAMS written in each tuning class is used.

Note that if eval_set is None, self.X and self.y are set to eval_set automatically.

Returns

best_params (dict[str, float]) – Returns best parameters determined by optimization
best_score (float) – Returns best score determined by optimization.

optuna_tuning(estimator=None, tuning_params=None, cv=None, seed=None, scoring=None, n_trials=None, study_kws=None, optimize_kws=None, not_opt_params=None, int_params=None, param_scales=None, mlflow_logging=None, mlflow_tracking_uri=None, mlflow_artifact_location=None, mlflow_experiment_name=None, fit_params=None)¶

Run bayesian optimization using Optuna library.

This method is usually faster than other tuning methods, so we recommend using it.

Parameters

estimator (estimator object implementing fit, default=None) –
Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.

Note that the parameters of the estimator are overridden by not_opt_params

If None, ESTIMATOR written in each tuning class is used.
tuning_params (dict(str, tuple(float, float)), default=None) –
Dictionary with parameters names (str) as keys and tuples of or minimum limit and maximum limit of parameters as value.

If None, BAYES_PARAMS written in each tuning class is used.
cv (int, cross-validation generator, or an iterable, default=5) –
Determines the cross-validation splitting strategy.

If int, to specify the number of folds in a KFold.
seed (int, default=42) –
Seed for random number generator of estimator and cross validation.

Note that “random_state” in not_opt_params are overridden by this argument.
scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.
n_trials (int, default=None) –
Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.

If None, N_ITER_OPTUNA written in each tuning class is used.
study_kws (dict, default=None) –
Additional parameters passed to optuna.study.create_study, e.g. sampler.

See https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.create_study.html#optuna.study.create_study
optimize_kws (dict, default=None) –
Additional parameters passed to optuna.study.Study.optimize, e.g. n_jobs.

Note that n_trials CAN NOT be used in the argument

See https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.Study.html#optuna.study.Study.optimize
not_opt_params (dict, default=None) –
Dictionary with parameters, which are NOT optimized.

Note that the parameters override those of the estimator.

If None, NOT_OPT_PARAMS written in each tuning class is used.
int_params (list[str], default=None) –
List of parameters whose type is int. The parameters are tuned by suggest_int() method.

If None, INT_PARAMS written in each tuning class is used.
param_scales (dict[str, {'linear', 'log'}], default=None) –
Dictionary with parameters’ scales which are passed to log argument of suggest_float() or suggest_int().

If ‘linear’, the axis of result graph is drawn in linear scale.

If ‘log’, the axis of result graph is drawn in log scale.

If None, PARAM_SCALES written in each tuning class is used.
mlflow_logging (str, default=None) –
Strategy to record the result by MLflow library.

If ‘inside’, mlflow process is started in the tuning instance. So you need not use start_run() explicitly.

If ‘outside’, mlflow process is NOT started in the tuning instance. So you should use start_run() outside the tune-easy library.

If None, mlflow is not used.
mlflow_tracking_uri (str, default=None) –
Tracking uri for MLflow. This argument is passed to tracking_uri in mlflow.set_tracking_uri()

See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri
mlflow_artifact_location (str, default=None) –
Artifact store for MLflow. This argument is passed to artifact_location in mlflow.create_experiment()

See https://mlflow.org/docs/latest/tracking.html#artifact-stores
mlflow_experiment_name (str, default=None) –
Experiment name for MLflow. This argument is passed to name in mlflow.create_experiment()

See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment
fit_params (dict, default=None) –
Parameters passed to the fit() method of the estimator, e.g. early_stopping_round and eval_set of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

If None, FIT_PARAMS written in each tuning class is used.

Note that if eval_set is None, self.X and self.y are set to eval_set automatically.

Returns

best_params (dict[str, float]) – Returns best parameters determined by optimization
best_score (float) – Returns best score determined by optimization.

plot_best_learning_curve(plot_stats='mean', ax=None)¶

Plot learning curve after optimization. This method is used to assess whether the optimized model is overfitting or not.

Parameters

plot_stats ({'mean', 'median'}) –
A statistic method plotted as validation curve

If ‘mean’, mean values are plotted as dark line and standard deviation values are filled in light color.

If ‘median’, median values are plotted as dark line and miminum and maximum values are filled in light color.
ax (matplotlib.axes.Axes, default=None) – Pre-existing axes for the plot.

plot_best_validation_curve(validation_curve_params=None, param_scales=None, plot_stats='mean', axes=None)¶

Plot validation curve after optimization.

This method is used to assess wheter the optimized model catches higest point of score.

Also, this method is used to assess whether the optimized model is overfitting or not.

Parameters

validation_curve_params (tuning_params : dict(str, list(float)), default=None) –
dict(str, list(float)), default=None Dictionary with parameters names (str) as keys and lists of parameter that will be evaluated as values.

If None, VALIDATION_CURVE_PARAMS written in each tuning class is used.
param_scales (dict(str, {'linear', 'log'}), default=None) –
Dictionary with parameters’ scales.

If ‘linear’, the axis of result graph is drawn in linear scale.

If ‘log’, the axis of result graph is drawn in log scale.

If None, PARAM_SCALES written in each tuning class is used.
plot_stats ({'mean', 'median'}) –
A statistic method plotted as validation curve

If ‘mean’, mean values are plotted as dark line and standard deviation values are filled in light color.

If ‘median’, median values are plotted as dark line and miminum and maximum values are filled in light color.
axes (list[matplotlib.axes.Axes]) –
List of pre-existing axes for the plot.

If None, each validation curve is plotted in different figure.

plot_feature_importances(ax=None)¶

Plot feature importances of best estimater. Available only if self.estimator is RandomForest, LightGBM, or XGBoost

Parameters: ax (matplotlib.axes.Axes, default=None) – Pre-existing axes for the plot.

plot_first_validation_curve(estimator=None, validation_curve_params=None, cv=None, seed=None, scoring=None, not_opt_params=None, param_scales=None, plot_stats='mean', axes=None, fit_params=None)¶

Plot validation curve before optimization. This method is used to determine parameter range.

Parameters

estimator (estimator object implementing fit, default=None) –
Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.

Note that the parameters of the estimator are overridden by not_opt_params

If None, ESTIMATOR written in each tuning class is used.
validation_curve_params (tuning_params : dict(str, list(float)), default=None) –
dict(str, list(float)), default=None Dictionary with parameters names (str) as keys and lists of parameter that will be evaluated as values.

If None, VALIDATION_CURVE_PARAMS written in each tuning class is used.
cv (int, cross-validation generator, or an iterable, default=5) –
Determines the cross-validation splitting strategy.

If int, to specify the number of folds in a KFold.
seed (int, default=42) –
Seed for random number generator of estimator and cross validation.

Note that “random_state” in not_opt_params are overridden by this argument.
scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.
not_opt_params (dict, default=None) –
Dictionary with parameters, which are NOT optimized.

Note that the parameters override those of the estimator.

If None, NOT_OPT_PARAMS written in each tuning class is used.
param_scales (dict(str, {'linear', 'log'}), default=None) –
Dictionary with parameters’ scales.

If ‘linear’, the axis of result graph is drawn in linear scale.

If ‘log’, the axis of result graph is drawn in log scale.

If None, PARAM_SCALES written in each tuning class is used.
plot_stats ({'mean', 'median'}) –
A statistic method plotted as validation curve

If ‘mean’, mean values are plotted as dark line and standard deviation values are filled in light color.

If ‘median’, median values are plotted as dark line and miminum and maximum values are filled in light color.
axes (list[matplotlib.axes.Axes]) –
List of pre-existing axes for the plot.

If None, each validation curve is plotted in different figure.
fit_params (dict, default=None) –
Parameters passed to the fit() method of the estimator, e.g. early_stopping_round and eval_set of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

If None, FIT_PARAMS written in each tuning class is used.

Note that if eval_set is None, self.X and self.y are set to eval_set automatically.

plot_search_history(ax=None, x_axis='index', plot_kws=None)¶

Plot high score history of optimization.

Parameters

ax (matplotlib.axes.Axes, default=None) – Pre-existing axes for the plot.
x_axis (str, optional) –
Type of x axis.

if ‘index’, put iteration index on x axis.

if ‘time’, put elapsed time on x axis.
plot_kws (dict, optional) –
Additional parameters passed to matplotlib.axes.Axes.plot(), e.g. alpha.

See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.plot.html

plot_search_map(order=None, pair_n=4, rounddigits_title=3, rank_number=None, rounddigits_score=3, subplot_kws=None, heat_kws=None, scatter_kws=None)¶

Plot score map. Values of parameters are plotted as X and Y axes. Scores are plotted as color density.

If self.tuning_algo is ‘grid’, the map is plotted as heat map.

Else, the map is plotted as scatter plot.

Parameters

order (list[str], default=None) –
Axis order of parameters. The order is applied to following order: x-axis of each graph, y-axis of each graph, y-axis of all graphs, x-axis of all graphs.

If None, the axis order of parameters is determined by parameter importance which is calculated by RandomForestRegressor using parameter values as X and using score values as y.
pair_n (int, default=4) – Number of rows/columns of the maps. Available only if number of parameters are three or more. If self.tuning_algo is ‘grid’, this argument is NOT available.
rounddigits_title (int, default=3) – Round a numbers of parameter range values which are displayed in graph titles to a given precision in decimal digits. If self.tuning_algo is ‘grid’, this argument is NOT available.
rank_number (int, default=None) – Number of emphasized data that are in the top posiotions for their score.
rounddigits_score (int, default=3) – Round a number of error that are in the top posiotions for regression error to a given precision in decimal digits.
subplot_kws (dict, default=None) –
Additional parameters passed to matplotlib.pyplot.subplots(), e.g. figsize.

See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html
heat_kws (dict, default=None) –
Additional parameters passed to sns.heatmap(), e.g. cmap. Available only if self.tuning_algo is ‘grid’.

See https://seaborn.pydata.org/generated/seaborn.heatmap.html
scatter_kws (Dict, default=None) –
Additional parameters passed to matplotlib.pyplot.scatter(), e.g. alpha. Available only if self.tuning_algo is NOT ‘grid’.

See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html

random_search_tuning(estimator=None, tuning_params=None, cv=None, seed=None, scoring=None, n_iter=None, not_opt_params=None, param_scales=None, mlflow_logging=None, mlflow_tracking_uri=None, mlflow_artifact_location=None, mlflow_experiment_name=None, rand_kws=None, fit_params=None)¶

Run random search optimization.

Parameters

estimator (estimator object implementing fit, default=None) –
Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.

Note that the parameters of the estimator are overridden by not_opt_params

If None, ESTIMATOR written in each tuning class is used.
tuning_params (dict(str, tuple(float, float)), default=None) –
Dictionary with parameters names (str) as keys and distributions or lists of parameters to try.

If None, CV_PARAMS_RANDOM written in each tuning class is used.
cv (int, cross-validation generator, or an iterable, default=5) –
Determines the cross-validation splitting strategy.

If int, to specify the number of folds in a KFold.
seed (int, default=42) –
Seed for random number generator of estimator and cross validation.

Note that “random_state” in not_opt_params are overridden by this argument.
scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.
n_iter (int, default=None) –
Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.

If None, N_ITER_RANDOM written in each tuning class is used.
not_opt_params (dict, default=None) –
Dictionary with parameters, which are NOT optimized.

Note that the parameters override those of the estimator.

If None, NOT_OPT_PARAMS written in each tuning class is used.
param_scales (dict[str, {'linear', 'log'}], default=None) –
Dictionary with parameters’ scales.

If ‘linear’, the axis of result graph is drawn in linear scale.

If ‘log’, the axis of result graph is drawn in log scale.

If None, PARAM_SCALES written in each tuning class is used.
mlflow_logging (str, default=None) –
Strategy to record the result by MLflow library.

If ‘inside’, mlflow process is started in the tuning instance. So you need not use start_run() explicitly.

If ‘outside’, mlflow process is NOT started in the tuning instance. So you should use start_run() outside the tune-easy library.

If None, mlflow is not used.
mlflow_tracking_uri (str, default=None) –
Tracking uri for MLflow. This argument is passed to tracking_uri in mlflow.set_tracking_uri()

See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri
mlflow_artifact_location (str, default=None) –
Artifact store for MLflow. This argument is passed to artifact_location in mlflow.create_experiment()

See https://mlflow.org/docs/latest/tracking.html#artifact-stores
mlflow_experiment_name (str, default=None) –
Experiment name for MLflow. This argument is passed to name in mlflow.create_experiment()

See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment
rand_kws (dict, default=None) –
Additional parameters passed to sklearn.model_selection.RandomizedSearchCV, e.g. n_jobs.

Note that estimator, param_grid, cv, and scoring CAN NOT be used in the argument

See https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html
fit_params (dict, default=None) –
Parameters passed to the fit() method of the estimator, e.g. early_stopping_round and eval_set of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

If None, FIT_PARAMS written in each tuning class is used.

Note that if eval_set is None, self.X and self.y are set to eval_set automatically.

Returns

best_params (dict[str, float]) – Returns best parameters determined by optimization
best_score (float) – Returns best score determined by optimization.