Base class of parameter tuning¶
This class is used for inheritance only, So you shouldn’t use this class directly.
Tuning class of each ML estimator inherits this class.
Note
If you want to see default arguments of each tuning class, See the following link
- Classification
LGBMClassifierTuning
LogisticRegressionTuning
RFClassifierTuning
SVMClassifierTuning
XGBClassifierTuning
- Regression
ElasticNetTuning
LGBMRegressorTuning
RFRegressorTuning
https://c60evaporator.github.io/tune-easy/each_estimators.html#tune_easy.rf_tuning.RFRegressorTuning
SVMRegressorTuning
XGBRegressorTuning
LinearRegression - No optimization, only display
tune_easy.param_tuning module¶
- class tune_easy.param_tuning.ParamTuning(X, y, x_colnames, y_colname=None, cv_group=None, eval_set_selection=None, **kwargs)¶
Bases:
object
Base class of tuning classes
This class is used for inheritance only, So you shouldn’t use this class directly.
- bayes_opt_tuning(estimator=None, tuning_params=None, cv=None, seed=None, scoring=None, n_iter=None, init_points=None, acq=None, not_opt_params=None, int_params=None, param_scales=None, mlflow_logging=None, mlflow_tracking_uri=None, mlflow_artifact_location=None, mlflow_experiment_name=None, fit_params=None)¶
Run bayesian optimization using
BayesianOptimization
library.- Parameters
estimator (estimator object implementing
fit
, default=None) –Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.
Note that the parameters of the estimator are overridden by
not_opt_params
If None,
ESTIMATOR
written in each tuning class is used.tuning_params (dict[str, tuple(float, float)], default=None) –
Dictionary with parameters names (
str
) as keys and tuples of or minimum limit and maximum limit of parameters as value.If None,
BAYES_PARAMS
written in each tuning class is used.cv (int, cross-validation generator, or an iterable, default=5) –
Determines the cross-validation splitting strategy.
If int, to specify the number of folds in a KFold.
seed (int, default=42) –
Seed for random number generator of estimator and cross validation.
Note that “random_state” in
not_opt_params
are overridden by this argument.scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.
n_iter (int, default=None) –
Number of iterations in bayesian optimization.
If None,
N_ITER_BAYES
written in each tuning class is used.init_points (int, default=None) –
Number of initialized points, which searched randomly.
If None,
INIT_POINTS
written in each tuning class is used.acq ({'ei', 'pi', 'ucb'}, default='ei') – Acquisition function
not_opt_params (dict, default=None) –
Dictionary with parameters, which are NOT optimized.
Note that the parameters override those of the estimator.
If None,
NOT_OPT_PARAMS
written in each tuning class is used.int_params (list[str], default=None) –
List of parameters whose type is int.
If None,
INT_PARAMS
written in each tuning class is used.param_scales (dict(str, {'linear', 'log'}), default=None) –
Dictionary with parameters’ scales.
If ‘linear’, the axis of result graph is drawn in linear scale.
If ‘log’, the axis of result graph is drawn in log scale.
If None,
PARAM_SCALES
written in each tuning class is used.mlflow_logging (str, default=None) –
Strategy to record the result by MLflow library.
If ‘inside’, mlflow process is started in the tuning instance. So you need not use
start_run()
explicitly.If ‘outside’, mlflow process is NOT started in the tuning instance. So you should use
start_run()
outside the tune-easy library.If None, mlflow is not used.
mlflow_tracking_uri (str, default=None) –
Tracking uri for MLflow. This argument is passed to
tracking_uri
inmlflow.set_tracking_uri()
See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri
mlflow_artifact_location (str, default=None) –
Artifact store for MLflow. This argument is passed to
artifact_location
inmlflow.create_experiment()
See https://mlflow.org/docs/latest/tracking.html#artifact-stores
mlflow_experiment_name (str, default=None) –
Experiment name for MLflow. This argument is passed to
name
inmlflow.create_experiment()
See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment
fit_params (dict, default=None) –
Parameters passed to the fit() method of the estimator, e.g.
early_stopping_round
andeval_set
of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.If None,
FIT_PARAMS
written in each tuning class is used.Note that if
eval_set
is None,self.X
andself.y
are set toeval_set
automatically.
- Returns
best_params (dict[str, float]) – Returns best parameters determined by optimization
best_score (float) – Returns best score determined by optimization.
- get_feature_importances()¶
Get feature importances of best estimater. Available only if self.estimator is RandomForest, LightGBM, or XGBoost
- Returns
df_importance – Returns feature importances of best estimater as pandas.DataFrame
- Return type
pandas.DataFrame
- get_search_history()¶
Get high score history of optimization as pandas.DataFrame
- Returns
df_history – Returns high score history of optimization as pandas.DataFrame
- Return type
pandas.DataFrame
- grid_search_tuning(estimator=None, tuning_params=None, cv=None, seed=None, scoring=None, not_opt_params=None, param_scales=None, mlflow_logging=None, mlflow_tracking_uri=None, mlflow_artifact_location=None, mlflow_experiment_name=None, grid_kws=None, fit_params=None)¶
Run grid search optimization.
- Parameters
estimator (estimator object implementing
fit
, default=None) –Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.
Note that the parameters of the estimator are overridden by
not_opt_params
If None,
ESTIMATOR
written in each tuning class is used.tuning_params (dict[str, list(float)], default=None) –
Dictionary with parameters names (
str
) as keys and lists of parameter settings to try as values.If None,
CV_PARAMS_GRID
written in each tuning class is used.cv (int, cross-validation generator, or an iterable, default=5) –
Determines the cross-validation splitting strategy.
If int, to specify the number of folds in a KFold.
seed (int, default=42) –
Seed for random number generator of estimator and cross validation.
Note that “random_state” in
not_opt_params
are overridden by this argument.scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.
not_opt_params (dict, default=None) –
Dictionary with parameters, which are NOT optimized.
Note that the parameters override those of the estimator.
If None,
NOT_OPT_PARAMS
written in each tuning class is used.param_scales (dict[str, {'linear', 'log'}], default=None) –
Dictionary with parameters’ scales.
If ‘linear’, the axis of result graph is drawn in linear scale.
If ‘log’, the axis of result graph is drawn in log scale.
If None,
PARAM_SCALES
written in each tuning class is used.mlflow_logging (str, default=None) –
Strategy to record the result by MLflow library.
If ‘inside’, mlflow process is started in the tuning instance. So you need not use
start_run()
explicitly.If ‘outside’, mlflow process is NOT started in the tuning instance. So you should use
start_run()
outside the tune-easy library.If None, mlflow is not used.
mlflow_tracking_uri (str, default=None) –
Tracking uri for MLflow. This argument is passed to
tracking_uri
inmlflow.set_tracking_uri()
See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri
mlflow_artifact_location (str, default=None) –
Artifact store for MLflow. This argument is passed to
artifact_location
inmlflow.create_experiment()
See https://mlflow.org/docs/latest/tracking.html#artifact-stores
mlflow_experiment_name (str, default=None) –
Experiment name for MLflow. This argument is passed to
name
inmlflow.create_experiment()
See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment
grid_kws (dict, default=None) –
Additional parameters passed to sklearn.model_selection.GridSearchCV, e.g.
n_jobs
.Note that
estimator
,param_grid
,cv
, andscoring
CAN NOT be used in the argumentSee https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
fit_params (dict, default=None) –
Parameters passed to the fit() method of the estimator, e.g.
early_stopping_round
andeval_set
of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.If None,
FIT_PARAMS
written in each tuning class is used.Note that if
eval_set
is None,self.X
andself.y
are set toeval_set
automatically.
- Returns
best_params (dict[str, float]) – Returns best parameters determined by optimization
best_score (float) – Returns best score determined by optimization.
- optuna_tuning(estimator=None, tuning_params=None, cv=None, seed=None, scoring=None, n_trials=None, study_kws=None, optimize_kws=None, not_opt_params=None, int_params=None, param_scales=None, mlflow_logging=None, mlflow_tracking_uri=None, mlflow_artifact_location=None, mlflow_experiment_name=None, fit_params=None)¶
Run bayesian optimization using
Optuna
library.This method is usually faster than other tuning methods, so we recommend using it.
- Parameters
estimator (estimator object implementing
fit
, default=None) –Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.
Note that the parameters of the estimator are overridden by
not_opt_params
If None,
ESTIMATOR
written in each tuning class is used.tuning_params (dict(str, tuple(float, float)), default=None) –
Dictionary with parameters names (
str
) as keys and tuples of or minimum limit and maximum limit of parameters as value.If None,
BAYES_PARAMS
written in each tuning class is used.cv (int, cross-validation generator, or an iterable, default=5) –
Determines the cross-validation splitting strategy.
If int, to specify the number of folds in a KFold.
seed (int, default=42) –
Seed for random number generator of estimator and cross validation.
Note that “random_state” in
not_opt_params
are overridden by this argument.scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.
n_trials (int, default=None) –
Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.
If None,
N_ITER_OPTUNA
written in each tuning class is used.study_kws (dict, default=None) –
Additional parameters passed to optuna.study.create_study, e.g.
sampler
.optimize_kws (dict, default=None) –
Additional parameters passed to optuna.study.Study.optimize, e.g.
n_jobs
.Note that
n_trials
CAN NOT be used in the argumentnot_opt_params (dict, default=None) –
Dictionary with parameters, which are NOT optimized.
Note that the parameters override those of the estimator.
If None,
NOT_OPT_PARAMS
written in each tuning class is used.int_params (list[str], default=None) –
List of parameters whose type is int. The parameters are tuned by
suggest_int()
method.If None,
INT_PARAMS
written in each tuning class is used.param_scales (dict[str, {'linear', 'log'}], default=None) –
Dictionary with parameters’ scales which are passed to
log
argument ofsuggest_float()
orsuggest_int()
.If ‘linear’, the axis of result graph is drawn in linear scale.
If ‘log’, the axis of result graph is drawn in log scale.
If None,
PARAM_SCALES
written in each tuning class is used.mlflow_logging (str, default=None) –
Strategy to record the result by MLflow library.
If ‘inside’, mlflow process is started in the tuning instance. So you need not use
start_run()
explicitly.If ‘outside’, mlflow process is NOT started in the tuning instance. So you should use
start_run()
outside the tune-easy library.If None, mlflow is not used.
mlflow_tracking_uri (str, default=None) –
Tracking uri for MLflow. This argument is passed to
tracking_uri
inmlflow.set_tracking_uri()
See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri
mlflow_artifact_location (str, default=None) –
Artifact store for MLflow. This argument is passed to
artifact_location
inmlflow.create_experiment()
See https://mlflow.org/docs/latest/tracking.html#artifact-stores
mlflow_experiment_name (str, default=None) –
Experiment name for MLflow. This argument is passed to
name
inmlflow.create_experiment()
See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment
fit_params (dict, default=None) –
Parameters passed to the fit() method of the estimator, e.g.
early_stopping_round
andeval_set
of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.If None,
FIT_PARAMS
written in each tuning class is used.Note that if
eval_set
is None,self.X
andself.y
are set toeval_set
automatically.
- Returns
best_params (dict[str, float]) – Returns best parameters determined by optimization
best_score (float) – Returns best score determined by optimization.
- plot_best_learning_curve(plot_stats='mean', ax=None)¶
Plot learning curve after optimization. This method is used to assess whether the optimized model is overfitting or not.
- Parameters
plot_stats ({'mean', 'median'}) –
A statistic method plotted as validation curve
If ‘mean’, mean values are plotted as dark line and standard deviation values are filled in light color.
If ‘median’, median values are plotted as dark line and miminum and maximum values are filled in light color.
ax (matplotlib.axes.Axes, default=None) – Pre-existing axes for the plot.
- plot_best_validation_curve(validation_curve_params=None, param_scales=None, plot_stats='mean', axes=None)¶
Plot validation curve after optimization.
This method is used to assess wheter the optimized model catches higest point of score.
Also, this method is used to assess whether the optimized model is overfitting or not.
- Parameters
validation_curve_params (tuning_params : dict(str, list(float)), default=None) –
dict(str, list(float)), default=None Dictionary with parameters names (
str
) as keys and lists of parameter that will be evaluated as values.If None,
VALIDATION_CURVE_PARAMS
written in each tuning class is used.param_scales (dict(str, {'linear', 'log'}), default=None) –
Dictionary with parameters’ scales.
If ‘linear’, the axis of result graph is drawn in linear scale.
If ‘log’, the axis of result graph is drawn in log scale.
If None,
PARAM_SCALES
written in each tuning class is used.plot_stats ({'mean', 'median'}) –
A statistic method plotted as validation curve
If ‘mean’, mean values are plotted as dark line and standard deviation values are filled in light color.
If ‘median’, median values are plotted as dark line and miminum and maximum values are filled in light color.
axes (list[matplotlib.axes.Axes]) –
List of pre-existing axes for the plot.
If None, each validation curve is plotted in different figure.
- plot_feature_importances(ax=None)¶
Plot feature importances of best estimater. Available only if self.estimator is RandomForest, LightGBM, or XGBoost
- Parameters
ax (matplotlib.axes.Axes, default=None) – Pre-existing axes for the plot.
- plot_first_validation_curve(estimator=None, validation_curve_params=None, cv=None, seed=None, scoring=None, not_opt_params=None, param_scales=None, plot_stats='mean', axes=None, fit_params=None)¶
Plot validation curve before optimization. This method is used to determine parameter range.
- Parameters
estimator (estimator object implementing
fit
, default=None) –Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.
Note that the parameters of the estimator are overridden by
not_opt_params
If None,
ESTIMATOR
written in each tuning class is used.validation_curve_params (tuning_params : dict(str, list(float)), default=None) –
dict(str, list(float)), default=None Dictionary with parameters names (
str
) as keys and lists of parameter that will be evaluated as values.If None,
VALIDATION_CURVE_PARAMS
written in each tuning class is used.cv (int, cross-validation generator, or an iterable, default=5) –
Determines the cross-validation splitting strategy.
If int, to specify the number of folds in a KFold.
seed (int, default=42) –
Seed for random number generator of estimator and cross validation.
Note that “random_state” in
not_opt_params
are overridden by this argument.scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.
not_opt_params (dict, default=None) –
Dictionary with parameters, which are NOT optimized.
Note that the parameters override those of the estimator.
If None,
NOT_OPT_PARAMS
written in each tuning class is used.param_scales (dict(str, {'linear', 'log'}), default=None) –
Dictionary with parameters’ scales.
If ‘linear’, the axis of result graph is drawn in linear scale.
If ‘log’, the axis of result graph is drawn in log scale.
If None,
PARAM_SCALES
written in each tuning class is used.plot_stats ({'mean', 'median'}) –
A statistic method plotted as validation curve
If ‘mean’, mean values are plotted as dark line and standard deviation values are filled in light color.
If ‘median’, median values are plotted as dark line and miminum and maximum values are filled in light color.
axes (list[matplotlib.axes.Axes]) –
List of pre-existing axes for the plot.
If None, each validation curve is plotted in different figure.
fit_params (dict, default=None) –
Parameters passed to the fit() method of the estimator, e.g.
early_stopping_round
andeval_set
of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.If None,
FIT_PARAMS
written in each tuning class is used.Note that if
eval_set
is None,self.X
andself.y
are set toeval_set
automatically.
- plot_search_history(ax=None, x_axis='index', plot_kws=None)¶
Plot high score history of optimization.
- Parameters
ax (matplotlib.axes.Axes, default=None) – Pre-existing axes for the plot.
x_axis (str, optional) –
Type of x axis.
if ‘index’, put iteration index on x axis.
if ‘time’, put elapsed time on x axis.
plot_kws (dict, optional) –
Additional parameters passed to matplotlib.axes.Axes.plot(), e.g.
alpha
.See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.plot.html
- plot_search_map(order=None, pair_n=4, rounddigits_title=3, rank_number=None, rounddigits_score=3, subplot_kws=None, heat_kws=None, scatter_kws=None)¶
Plot score map. Values of parameters are plotted as X and Y axes. Scores are plotted as color density.
If self.tuning_algo is ‘grid’, the map is plotted as heat map.
Else, the map is plotted as scatter plot.
- Parameters
order (list[str], default=None) –
Axis order of parameters. The order is applied to following order: x-axis of each graph, y-axis of each graph, y-axis of all graphs, x-axis of all graphs.
If None, the axis order of parameters is determined by parameter importance which is calculated by RandomForestRegressor using parameter values as X and using score values as y.
pair_n (int, default=4) – Number of rows/columns of the maps. Available only if number of parameters are three or more. If self.tuning_algo is ‘grid’, this argument is NOT available.
rounddigits_title (int, default=3) – Round a numbers of parameter range values which are displayed in graph titles to a given precision in decimal digits. If self.tuning_algo is ‘grid’, this argument is NOT available.
rank_number (int, default=None) – Number of emphasized data that are in the top posiotions for their score.
rounddigits_score (int, default=3) – Round a number of error that are in the top posiotions for regression error to a given precision in decimal digits.
subplot_kws (dict, default=None) –
Additional parameters passed to matplotlib.pyplot.subplots(), e.g.
figsize
.See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html
heat_kws (dict, default=None) –
Additional parameters passed to sns.heatmap(), e.g.
cmap
. Available only if self.tuning_algo is ‘grid’.See https://seaborn.pydata.org/generated/seaborn.heatmap.html
scatter_kws (Dict, default=None) –
Additional parameters passed to matplotlib.pyplot.scatter(), e.g.
alpha
. Available only if self.tuning_algo is NOT ‘grid’.See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html
- random_search_tuning(estimator=None, tuning_params=None, cv=None, seed=None, scoring=None, n_iter=None, not_opt_params=None, param_scales=None, mlflow_logging=None, mlflow_tracking_uri=None, mlflow_artifact_location=None, mlflow_experiment_name=None, rand_kws=None, fit_params=None)¶
Run random search optimization.
- Parameters
estimator (estimator object implementing
fit
, default=None) –Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.
Note that the parameters of the estimator are overridden by
not_opt_params
If None,
ESTIMATOR
written in each tuning class is used.tuning_params (dict(str, tuple(float, float)), default=None) –
Dictionary with parameters names (
str
) as keys and distributions or lists of parameters to try.If None,
CV_PARAMS_RANDOM
written in each tuning class is used.cv (int, cross-validation generator, or an iterable, default=5) –
Determines the cross-validation splitting strategy.
If int, to specify the number of folds in a KFold.
seed (int, default=42) –
Seed for random number generator of estimator and cross validation.
Note that “random_state” in
not_opt_params
are overridden by this argument.scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.
n_iter (int, default=None) –
Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.
If None,
N_ITER_RANDOM
written in each tuning class is used.not_opt_params (dict, default=None) –
Dictionary with parameters, which are NOT optimized.
Note that the parameters override those of the estimator.
If None,
NOT_OPT_PARAMS
written in each tuning class is used.param_scales (dict[str, {'linear', 'log'}], default=None) –
Dictionary with parameters’ scales.
If ‘linear’, the axis of result graph is drawn in linear scale.
If ‘log’, the axis of result graph is drawn in log scale.
If None,
PARAM_SCALES
written in each tuning class is used.mlflow_logging (str, default=None) –
Strategy to record the result by MLflow library.
If ‘inside’, mlflow process is started in the tuning instance. So you need not use
start_run()
explicitly.If ‘outside’, mlflow process is NOT started in the tuning instance. So you should use
start_run()
outside the tune-easy library.If None, mlflow is not used.
mlflow_tracking_uri (str, default=None) –
Tracking uri for MLflow. This argument is passed to
tracking_uri
inmlflow.set_tracking_uri()
See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri
mlflow_artifact_location (str, default=None) –
Artifact store for MLflow. This argument is passed to
artifact_location
inmlflow.create_experiment()
See https://mlflow.org/docs/latest/tracking.html#artifact-stores
mlflow_experiment_name (str, default=None) –
Experiment name for MLflow. This argument is passed to
name
inmlflow.create_experiment()
See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment
rand_kws (dict, default=None) –
Additional parameters passed to sklearn.model_selection.RandomizedSearchCV, e.g.
n_jobs
.Note that
estimator
,param_grid
,cv
, andscoring
CAN NOT be used in the argumentSee https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html
fit_params (dict, default=None) –
Parameters passed to the fit() method of the estimator, e.g.
early_stopping_round
andeval_set
of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.If None,
FIT_PARAMS
written in each tuning class is used.Note that if
eval_set
is None,self.X
andself.y
are set toeval_set
automatically.
- Returns
best_params (dict[str, float]) – Returns best parameters determined by optimization
best_score (float) – Returns best score determined by optimization.