Base class of parameter tuning

This class is used for inheritance only, So you shouldn’t use this class directly.

Tuning class of each ML estimator inherits this class.

Note

If you want to see default arguments of each tuning class, See the following link

tune_easy.param_tuning module

class tune_easy.param_tuning.ParamTuning(X, y, x_colnames, y_colname=None, cv_group=None, eval_set_selection=None, **kwargs)

Bases: object

Base class of tuning classes

This class is used for inheritance only, So you shouldn’t use this class directly.

bayes_opt_tuning(estimator=None, tuning_params=None, cv=None, seed=None, scoring=None, n_iter=None, init_points=None, acq=None, not_opt_params=None, int_params=None, param_scales=None, mlflow_logging=None, mlflow_tracking_uri=None, mlflow_artifact_location=None, mlflow_experiment_name=None, fit_params=None)

Run bayesian optimization using BayesianOptimization library.

Parameters
  • estimator (estimator object implementing fit, default=None) –

    Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.

    Note that the parameters of the estimator are overridden by not_opt_params

    If None, ESTIMATOR written in each tuning class is used.

  • tuning_params (dict[str, tuple(float, float)], default=None) –

    Dictionary with parameters names (str) as keys and tuples of or minimum limit and maximum limit of parameters as value.

    If None, BAYES_PARAMS written in each tuning class is used.

  • cv (int, cross-validation generator, or an iterable, default=5) –

    Determines the cross-validation splitting strategy.

    If int, to specify the number of folds in a KFold.

  • seed (int, default=42) –

    Seed for random number generator of estimator and cross validation.

    Note that “random_state” in not_opt_params are overridden by this argument.

  • scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.

  • n_iter (int, default=None) –

    Number of iterations in bayesian optimization.

    If None, N_ITER_BAYES written in each tuning class is used.

  • init_points (int, default=None) –

    Number of initialized points, which searched randomly.

    If None, INIT_POINTS written in each tuning class is used.

  • acq ({'ei', 'pi', 'ucb'}, default='ei') – Acquisition function

  • not_opt_params (dict, default=None) –

    Dictionary with parameters, which are NOT optimized.

    Note that the parameters override those of the estimator.

    If None, NOT_OPT_PARAMS written in each tuning class is used.

  • int_params (list[str], default=None) –

    List of parameters whose type is int.

    If None, INT_PARAMS written in each tuning class is used.

  • param_scales (dict(str, {'linear', 'log'}), default=None) –

    Dictionary with parameters’ scales.

    If ‘linear’, the axis of result graph is drawn in linear scale.

    If ‘log’, the axis of result graph is drawn in log scale.

    If None, PARAM_SCALES written in each tuning class is used.

  • mlflow_logging (str, default=None) –

    Strategy to record the result by MLflow library.

    If ‘inside’, mlflow process is started in the tuning instance. So you need not use start_run() explicitly.

    If ‘outside’, mlflow process is NOT started in the tuning instance. So you should use start_run() outside the tune-easy library.

    If None, mlflow is not used.

  • mlflow_tracking_uri (str, default=None) –

    Tracking uri for MLflow. This argument is passed to tracking_uri in mlflow.set_tracking_uri()

    See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri

  • mlflow_artifact_location (str, default=None) –

    Artifact store for MLflow. This argument is passed to artifact_location in mlflow.create_experiment()

    See https://mlflow.org/docs/latest/tracking.html#artifact-stores

  • mlflow_experiment_name (str, default=None) –

    Experiment name for MLflow. This argument is passed to name in mlflow.create_experiment()

    See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment

  • fit_params (dict, default=None) –

    Parameters passed to the fit() method of the estimator, e.g. early_stopping_round and eval_set of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

    If None, FIT_PARAMS written in each tuning class is used.

    Note that if eval_set is None, self.X and self.y are set to eval_set automatically.

Returns

  • best_params (dict[str, float]) – Returns best parameters determined by optimization

  • best_score (float) – Returns best score determined by optimization.

get_feature_importances()

Get feature importances of best estimater. Available only if self.estimator is RandomForest, LightGBM, or XGBoost

Returns

df_importance – Returns feature importances of best estimater as pandas.DataFrame

Return type

pandas.DataFrame

get_search_history()

Get high score history of optimization as pandas.DataFrame

Returns

df_history – Returns high score history of optimization as pandas.DataFrame

Return type

pandas.DataFrame

grid_search_tuning(estimator=None, tuning_params=None, cv=None, seed=None, scoring=None, not_opt_params=None, param_scales=None, mlflow_logging=None, mlflow_tracking_uri=None, mlflow_artifact_location=None, mlflow_experiment_name=None, grid_kws=None, fit_params=None)

Run grid search optimization.

Parameters
  • estimator (estimator object implementing fit, default=None) –

    Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.

    Note that the parameters of the estimator are overridden by not_opt_params

    If None, ESTIMATOR written in each tuning class is used.

  • tuning_params (dict[str, list(float)], default=None) –

    Dictionary with parameters names (str) as keys and lists of parameter settings to try as values.

    If None, CV_PARAMS_GRID written in each tuning class is used.

  • cv (int, cross-validation generator, or an iterable, default=5) –

    Determines the cross-validation splitting strategy.

    If int, to specify the number of folds in a KFold.

  • seed (int, default=42) –

    Seed for random number generator of estimator and cross validation.

    Note that “random_state” in not_opt_params are overridden by this argument.

  • scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.

  • not_opt_params (dict, default=None) –

    Dictionary with parameters, which are NOT optimized.

    Note that the parameters override those of the estimator.

    If None, NOT_OPT_PARAMS written in each tuning class is used.

  • param_scales (dict[str, {'linear', 'log'}], default=None) –

    Dictionary with parameters’ scales.

    If ‘linear’, the axis of result graph is drawn in linear scale.

    If ‘log’, the axis of result graph is drawn in log scale.

    If None, PARAM_SCALES written in each tuning class is used.

  • mlflow_logging (str, default=None) –

    Strategy to record the result by MLflow library.

    If ‘inside’, mlflow process is started in the tuning instance. So you need not use start_run() explicitly.

    If ‘outside’, mlflow process is NOT started in the tuning instance. So you should use start_run() outside the tune-easy library.

    If None, mlflow is not used.

  • mlflow_tracking_uri (str, default=None) –

    Tracking uri for MLflow. This argument is passed to tracking_uri in mlflow.set_tracking_uri()

    See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri

  • mlflow_artifact_location (str, default=None) –

    Artifact store for MLflow. This argument is passed to artifact_location in mlflow.create_experiment()

    See https://mlflow.org/docs/latest/tracking.html#artifact-stores

  • mlflow_experiment_name (str, default=None) –

    Experiment name for MLflow. This argument is passed to name in mlflow.create_experiment()

    See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment

  • grid_kws (dict, default=None) –

    Additional parameters passed to sklearn.model_selection.GridSearchCV, e.g. n_jobs.

    Note that estimator, param_grid, cv, and scoring CAN NOT be used in the argument

    See https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

  • fit_params (dict, default=None) –

    Parameters passed to the fit() method of the estimator, e.g. early_stopping_round and eval_set of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

    If None, FIT_PARAMS written in each tuning class is used.

    Note that if eval_set is None, self.X and self.y are set to eval_set automatically.

Returns

  • best_params (dict[str, float]) – Returns best parameters determined by optimization

  • best_score (float) – Returns best score determined by optimization.

optuna_tuning(estimator=None, tuning_params=None, cv=None, seed=None, scoring=None, n_trials=None, study_kws=None, optimize_kws=None, not_opt_params=None, int_params=None, param_scales=None, mlflow_logging=None, mlflow_tracking_uri=None, mlflow_artifact_location=None, mlflow_experiment_name=None, fit_params=None)

Run bayesian optimization using Optuna library.

This method is usually faster than other tuning methods, so we recommend using it.

Parameters
  • estimator (estimator object implementing fit, default=None) –

    Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.

    Note that the parameters of the estimator are overridden by not_opt_params

    If None, ESTIMATOR written in each tuning class is used.

  • tuning_params (dict(str, tuple(float, float)), default=None) –

    Dictionary with parameters names (str) as keys and tuples of or minimum limit and maximum limit of parameters as value.

    If None, BAYES_PARAMS written in each tuning class is used.

  • cv (int, cross-validation generator, or an iterable, default=5) –

    Determines the cross-validation splitting strategy.

    If int, to specify the number of folds in a KFold.

  • seed (int, default=42) –

    Seed for random number generator of estimator and cross validation.

    Note that “random_state” in not_opt_params are overridden by this argument.

  • scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.

  • n_trials (int, default=None) –

    Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.

    If None, N_ITER_OPTUNA written in each tuning class is used.

  • study_kws (dict, default=None) –

    Additional parameters passed to optuna.study.create_study, e.g. sampler.

    See https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.create_study.html#optuna.study.create_study

  • optimize_kws (dict, default=None) –

    Additional parameters passed to optuna.study.Study.optimize, e.g. n_jobs.

    Note that n_trials CAN NOT be used in the argument

    See https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.Study.html#optuna.study.Study.optimize

  • not_opt_params (dict, default=None) –

    Dictionary with parameters, which are NOT optimized.

    Note that the parameters override those of the estimator.

    If None, NOT_OPT_PARAMS written in each tuning class is used.

  • int_params (list[str], default=None) –

    List of parameters whose type is int. The parameters are tuned by suggest_int() method.

    If None, INT_PARAMS written in each tuning class is used.

  • param_scales (dict[str, {'linear', 'log'}], default=None) –

    Dictionary with parameters’ scales which are passed to log argument of suggest_float() or suggest_int().

    If ‘linear’, the axis of result graph is drawn in linear scale.

    If ‘log’, the axis of result graph is drawn in log scale.

    If None, PARAM_SCALES written in each tuning class is used.

  • mlflow_logging (str, default=None) –

    Strategy to record the result by MLflow library.

    If ‘inside’, mlflow process is started in the tuning instance. So you need not use start_run() explicitly.

    If ‘outside’, mlflow process is NOT started in the tuning instance. So you should use start_run() outside the tune-easy library.

    If None, mlflow is not used.

  • mlflow_tracking_uri (str, default=None) –

    Tracking uri for MLflow. This argument is passed to tracking_uri in mlflow.set_tracking_uri()

    See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri

  • mlflow_artifact_location (str, default=None) –

    Artifact store for MLflow. This argument is passed to artifact_location in mlflow.create_experiment()

    See https://mlflow.org/docs/latest/tracking.html#artifact-stores

  • mlflow_experiment_name (str, default=None) –

    Experiment name for MLflow. This argument is passed to name in mlflow.create_experiment()

    See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment

  • fit_params (dict, default=None) –

    Parameters passed to the fit() method of the estimator, e.g. early_stopping_round and eval_set of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

    If None, FIT_PARAMS written in each tuning class is used.

    Note that if eval_set is None, self.X and self.y are set to eval_set automatically.

Returns

  • best_params (dict[str, float]) – Returns best parameters determined by optimization

  • best_score (float) – Returns best score determined by optimization.

plot_best_learning_curve(plot_stats='mean', ax=None)

Plot learning curve after optimization. This method is used to assess whether the optimized model is overfitting or not.

Parameters
  • plot_stats ({'mean', 'median'}) –

    A statistic method plotted as validation curve

    If ‘mean’, mean values are plotted as dark line and standard deviation values are filled in light color.

    If ‘median’, median values are plotted as dark line and miminum and maximum values are filled in light color.

  • ax (matplotlib.axes.Axes, default=None) – Pre-existing axes for the plot.

plot_best_validation_curve(validation_curve_params=None, param_scales=None, plot_stats='mean', axes=None)

Plot validation curve after optimization.

This method is used to assess wheter the optimized model catches higest point of score.

Also, this method is used to assess whether the optimized model is overfitting or not.

Parameters
  • validation_curve_params (tuning_params : dict(str, list(float)), default=None) –

    dict(str, list(float)), default=None Dictionary with parameters names (str) as keys and lists of parameter that will be evaluated as values.

    If None, VALIDATION_CURVE_PARAMS written in each tuning class is used.

  • param_scales (dict(str, {'linear', 'log'}), default=None) –

    Dictionary with parameters’ scales.

    If ‘linear’, the axis of result graph is drawn in linear scale.

    If ‘log’, the axis of result graph is drawn in log scale.

    If None, PARAM_SCALES written in each tuning class is used.

  • plot_stats ({'mean', 'median'}) –

    A statistic method plotted as validation curve

    If ‘mean’, mean values are plotted as dark line and standard deviation values are filled in light color.

    If ‘median’, median values are plotted as dark line and miminum and maximum values are filled in light color.

  • axes (list[matplotlib.axes.Axes]) –

    List of pre-existing axes for the plot.

    If None, each validation curve is plotted in different figure.

plot_feature_importances(ax=None)

Plot feature importances of best estimater. Available only if self.estimator is RandomForest, LightGBM, or XGBoost

Parameters

ax (matplotlib.axes.Axes, default=None) – Pre-existing axes for the plot.

plot_first_validation_curve(estimator=None, validation_curve_params=None, cv=None, seed=None, scoring=None, not_opt_params=None, param_scales=None, plot_stats='mean', axes=None, fit_params=None)

Plot validation curve before optimization. This method is used to determine parameter range.

Parameters
  • estimator (estimator object implementing fit, default=None) –

    Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.

    Note that the parameters of the estimator are overridden by not_opt_params

    If None, ESTIMATOR written in each tuning class is used.

  • validation_curve_params (tuning_params : dict(str, list(float)), default=None) –

    dict(str, list(float)), default=None Dictionary with parameters names (str) as keys and lists of parameter that will be evaluated as values.

    If None, VALIDATION_CURVE_PARAMS written in each tuning class is used.

  • cv (int, cross-validation generator, or an iterable, default=5) –

    Determines the cross-validation splitting strategy.

    If int, to specify the number of folds in a KFold.

  • seed (int, default=42) –

    Seed for random number generator of estimator and cross validation.

    Note that “random_state” in not_opt_params are overridden by this argument.

  • scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.

  • not_opt_params (dict, default=None) –

    Dictionary with parameters, which are NOT optimized.

    Note that the parameters override those of the estimator.

    If None, NOT_OPT_PARAMS written in each tuning class is used.

  • param_scales (dict(str, {'linear', 'log'}), default=None) –

    Dictionary with parameters’ scales.

    If ‘linear’, the axis of result graph is drawn in linear scale.

    If ‘log’, the axis of result graph is drawn in log scale.

    If None, PARAM_SCALES written in each tuning class is used.

  • plot_stats ({'mean', 'median'}) –

    A statistic method plotted as validation curve

    If ‘mean’, mean values are plotted as dark line and standard deviation values are filled in light color.

    If ‘median’, median values are plotted as dark line and miminum and maximum values are filled in light color.

  • axes (list[matplotlib.axes.Axes]) –

    List of pre-existing axes for the plot.

    If None, each validation curve is plotted in different figure.

  • fit_params (dict, default=None) –

    Parameters passed to the fit() method of the estimator, e.g. early_stopping_round and eval_set of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

    If None, FIT_PARAMS written in each tuning class is used.

    Note that if eval_set is None, self.X and self.y are set to eval_set automatically.

plot_search_history(ax=None, x_axis='index', plot_kws=None)

Plot high score history of optimization.

Parameters
  • ax (matplotlib.axes.Axes, default=None) – Pre-existing axes for the plot.

  • x_axis (str, optional) –

    Type of x axis.

    if ‘index’, put iteration index on x axis.

    if ‘time’, put elapsed time on x axis.

  • plot_kws (dict, optional) –

    Additional parameters passed to matplotlib.axes.Axes.plot(), e.g. alpha.

    See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.plot.html

plot_search_map(order=None, pair_n=4, rounddigits_title=3, rank_number=None, rounddigits_score=3, subplot_kws=None, heat_kws=None, scatter_kws=None)

Plot score map. Values of parameters are plotted as X and Y axes. Scores are plotted as color density.

If self.tuning_algo is ‘grid’, the map is plotted as heat map.

Else, the map is plotted as scatter plot.

Parameters
  • order (list[str], default=None) –

    Axis order of parameters. The order is applied to following order: x-axis of each graph, y-axis of each graph, y-axis of all graphs, x-axis of all graphs.

    If None, the axis order of parameters is determined by parameter importance which is calculated by RandomForestRegressor using parameter values as X and using score values as y.

  • pair_n (int, default=4) – Number of rows/columns of the maps. Available only if number of parameters are three or more. If self.tuning_algo is ‘grid’, this argument is NOT available.

  • rounddigits_title (int, default=3) – Round a numbers of parameter range values which are displayed in graph titles to a given precision in decimal digits. If self.tuning_algo is ‘grid’, this argument is NOT available.

  • rank_number (int, default=None) – Number of emphasized data that are in the top posiotions for their score.

  • rounddigits_score (int, default=3) – Round a number of error that are in the top posiotions for regression error to a given precision in decimal digits.

  • subplot_kws (dict, default=None) –

    Additional parameters passed to matplotlib.pyplot.subplots(), e.g. figsize.

    See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html

  • heat_kws (dict, default=None) –

    Additional parameters passed to sns.heatmap(), e.g. cmap. Available only if self.tuning_algo is ‘grid’.

    See https://seaborn.pydata.org/generated/seaborn.heatmap.html

  • scatter_kws (Dict, default=None) –

    Additional parameters passed to matplotlib.pyplot.scatter(), e.g. alpha. Available only if self.tuning_algo is NOT ‘grid’.

    See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html

random_search_tuning(estimator=None, tuning_params=None, cv=None, seed=None, scoring=None, n_iter=None, not_opt_params=None, param_scales=None, mlflow_logging=None, mlflow_tracking_uri=None, mlflow_artifact_location=None, mlflow_experiment_name=None, rand_kws=None, fit_params=None)

Run random search optimization.

Parameters
  • estimator (estimator object implementing fit, default=None) –

    Classification or regression estimators used to tuning. This is assumed to implement the scikit-learn estimator interface.

    Note that the parameters of the estimator are overridden by not_opt_params

    If None, ESTIMATOR written in each tuning class is used.

  • tuning_params (dict(str, tuple(float, float)), default=None) –

    Dictionary with parameters names (str) as keys and distributions or lists of parameters to try.

    If None, CV_PARAMS_RANDOM written in each tuning class is used.

  • cv (int, cross-validation generator, or an iterable, default=5) –

    Determines the cross-validation splitting strategy.

    If int, to specify the number of folds in a KFold.

  • seed (int, default=42) –

    Seed for random number generator of estimator and cross validation.

    Note that “random_state” in not_opt_params are overridden by this argument.

  • scoring (str, callable, list, tuple or dict, default='neg_root_mean_squared_error' in regression, 'logloss' in classification.) – Strategy to evaluate the performance of the cross-validated model on the test set.

  • n_iter (int, default=None) –

    Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.

    If None, N_ITER_RANDOM written in each tuning class is used.

  • not_opt_params (dict, default=None) –

    Dictionary with parameters, which are NOT optimized.

    Note that the parameters override those of the estimator.

    If None, NOT_OPT_PARAMS written in each tuning class is used.

  • param_scales (dict[str, {'linear', 'log'}], default=None) –

    Dictionary with parameters’ scales.

    If ‘linear’, the axis of result graph is drawn in linear scale.

    If ‘log’, the axis of result graph is drawn in log scale.

    If None, PARAM_SCALES written in each tuning class is used.

  • mlflow_logging (str, default=None) –

    Strategy to record the result by MLflow library.

    If ‘inside’, mlflow process is started in the tuning instance. So you need not use start_run() explicitly.

    If ‘outside’, mlflow process is NOT started in the tuning instance. So you should use start_run() outside the tune-easy library.

    If None, mlflow is not used.

  • mlflow_tracking_uri (str, default=None) –

    Tracking uri for MLflow. This argument is passed to tracking_uri in mlflow.set_tracking_uri()

    See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri

  • mlflow_artifact_location (str, default=None) –

    Artifact store for MLflow. This argument is passed to artifact_location in mlflow.create_experiment()

    See https://mlflow.org/docs/latest/tracking.html#artifact-stores

  • mlflow_experiment_name (str, default=None) –

    Experiment name for MLflow. This argument is passed to name in mlflow.create_experiment()

    See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment

  • rand_kws (dict, default=None) –

    Additional parameters passed to sklearn.model_selection.RandomizedSearchCV, e.g. n_jobs.

    Note that estimator, param_grid, cv, and scoring CAN NOT be used in the argument

    See https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html

  • fit_params (dict, default=None) –

    Parameters passed to the fit() method of the estimator, e.g. early_stopping_round and eval_set of XGBRegressor or LGBMRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

    If None, FIT_PARAMS written in each tuning class is used.

    Note that if eval_set is None, self.X and self.y are set to eval_set automatically.

Returns

  • best_params (dict[str, float]) – Returns best parameters determined by optimization

  • best_score (float) – Returns best score determined by optimization.