All-in-one tuning class

tune_easy.all_in_one_tuning module

class tune_easy.all_in_one_tuning.AllInOneTuning

Bases: object

LEARNING_ALGOS = {'binary': ['svm', 'logistic', 'randomforest', 'lightgbm'], 'multiclass': ['svm', 'logistic', 'randomforest', 'lightgbm'], 'regression': ['linear_regression', 'elasticnet', 'svr', 'randomforest', 'lightgbm']}
N_ITER = {'binary': {'lightgbm': 200, 'logistic': 500, 'randomforest': 300, 'svm': 500, 'xgboost': 100}, 'multiclass': {'lightgbm': 200, 'logistic': 500, 'randomforest': 300, 'svm': 50, 'xgboost': 100}, 'regression': {'elasticnet': 500, 'lightgbm': 200, 'randomforest': 300, 'svr': 500, 'xgboost': 100}}
OTHER_SCORES = {'binary': ['accuracy', 'precision', 'recall', 'f1', 'logloss', 'auc'], 'multiclass': ['accuracy', 'precision_macro', 'recall_macro', 'f1_macro', 'logloss', 'auc_ovr'], 'regression': ['rmse', 'mae', 'mape', 'r2']}
SCORING = {'binary': 'logloss', 'multiclass': 'logloss', 'regression': 'rmse'}
all_in_one_tuning(x, y, data=None, x_colnames=None, cv_group=None, objective=None, scoring=None, other_scores=None, learning_algos=None, n_iter=None, cv=5, tuning_algo='optuna', seed=42, estimators=None, tuning_params=None, mlflow_logging=False, mlflow_tracking_uri=None, mlflow_artifact_location=None, mlflow_experiment_name=None, tuning_kws=None)

Parameter tuning with multiple estimators, extremely easy to use.

Parameters
  • x (list[str], or numpy.ndarray) – Explanatory variables. Should be list[str] if data is pd.DataFrame. Should be numpy.ndarray if data is None.

  • y (str or numpy.ndarray) – Target variable. Should be str if data is pd.DataFrame. Should be numpy.ndarray if data is None.

  • data (pd.DataFrame, default=None) – Input data structure.

  • x_colnames (list[str], default=None) – Names of explanatory variables. Available only if data is NOT pd.DataFrame.

  • cv_group (str or numpy.ndarray, default=None) – Grouping variable that will be used for GroupKFold or LeaveOneGroupOut. Should be str if data is pd.DataFrame.

  • objective ({'classification', 'regression'}, default=None) – Specify the learning task. If None, select task by target variable automatically.

  • scoring (str, default=None) –

    Score name used to parameter tuning.

    • In regression:
      • ’rmse’ : Root mean squared error

      • ’mse’ : Mean squared error

      • ’mae’ : Mean absolute error

      • ’rmsle’ : Rot mean absolute logarithmic error

      • ’mape’ : Mean absolute percentage error

      • ’r2’ : R2 Score

    • In binary classification:
      • ’logloss’ : Logarithmic Loss

      • ’accuracy’ : Accuracy

      • ’precision’ : Precision

      • ’recall’ : Recall

      • ’f1’ : F1 score

      • ’pr_auc’ : PR-AUC

      • ’auc’ : AUC

    • In multiclass classification:
      • ’logloss’ : Logarithmic Loss

      • ’accuracy’ : Accuracy

      • ’precision_macro’ : Precision macro

      • ’recall_macro’ : Recall macro

      • ’f1_micro’ : F1 micro

      • ’f1_macro’ : F1 macro

      • ’f1_weighted’ : F1 weighted

      • ’auc_ovr’ : One-vs-rest AUC

      • ’auc_ovo’ : One-vs-one AUC

      • ’auc_ovr’ : One-vs-rest AUC weighted

      • ’auc_ovo’ : One-vs-one AUC weighted

    If None, the SCORING constant is used.

    See https://c60evaporator.github.io/tune-easy/all_in_one_tuning.html#tune_easy.all_in_one_tuning.AllInOneTuning.SCORING

  • other_scores (list[str], default=None) –

    Score names calculated after tuning. Available score names are written in the explatnation of scoring argument.

    If None, the OTHER_SCORES constant is used.

    See https://c60evaporator.github.io/tune-easy/all_in_one_tuning.html#tune_easy.all_in_one_tuning.AllInOneTuning.OTHER_SCORES

  • learning_algos (list[str], default=None) –

    Estimator algorithm. Select the following algorithms and make a list of them.

    • In regression:
      • ’linear_regression’ : LinearRegression

      • ’elasticnet’ : ElasticNet

      • ’svr’ : SVR

      • ’randomforest’ : RandomForestRegressor

      • ’lightgbm’ : LGBMRegressor

      • ’xgboost’ : XGBRegressor

    • In regression:
      • ’svm’ : SVC

      • ’logistic’ : LogisticRegression

      • ’randomforest’ : RandomForestClassifier

      • ’lightgbm’ : LGBMClassifier

      • ’xgboost’ : XGBClassifier

    If None, the LEARNING_ALGOS constant is used.

    See https://c60evaporator.github.io/tune-easy/all_in_one_tuning.html#tune_easy.all_in_one_tuning.AllInOneTuning.LEARNING_ALGOS

  • n_iter (dict[str, int], default=None) –

    Iteration number of parameter tuning. Keys should be members of learning_algos argument. Values should be iteration numbers.

    If None, the N_ITER constant is used.

    See https://c60evaporator.github.io/tune-easy/all_in_one_tuning.html#tune_easy.all_in_one_tuning.AllInOneTuning.N_ITER

  • cv (int, cross-validation generator, or an iterable, default=5) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.

  • tuning_algo ({'grid', 'random', 'bo', 'optuna'}, default='optuna') – Tuning algorithm using following libraries. ‘grid’: sklearn.model_selection.GridSearchCV, ‘random’: sklearn.model_selection.RandomizedSearchCV, ‘bo’: BayesianOptimization, ‘optuna’: Optuna.

  • seed (int, default=42) – Seed for random number generator of cross validation, estimators, and optuna.sampler.

  • estimators (dict[str, estimator object implementing 'fit'], default=None) –

    Classification or regression estimators used to tuning. Keys should be members of learning_algos argument. Values are assumed to implement the scikit-learn estimator interface.

    If None, use default estimators of tuning instances

    See https://c60evaporator.github.io/tune-easy/each_estimators.html

  • tuning_params (dict[str, dict[str, {list, tuple}]], default=None) –

    Values should be dictionary with parameters names as keys and lists of parameter settings or parameter range to try as values. Keys should be members of learning_algos argument.

    If None, use default values of tuning instances

    See https://c60evaporator.github.io/tune-easy/each_estimators.html

  • mlflow_logging (str, default=None) –

    Strategy to record the result by MLflow library.

    If True, nested runs are created. The parent run records conparison of all estimatiors such as max score history. The child runs are created in each tuning instances by setting mlflow_logging argument to “outside”

    If False, MLflow runs are not created.

  • mlflow_tracking_uri (str, default=None) –

    Tracking uri for MLflow. This argument is passed to tracking_uri in mlflow.set_tracking_uri()

    See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri

  • mlflow_artifact_location (str, default=None) –

    Artifact store for MLflow. This argument is passed to artifact_location in mlflow.create_experiment()

    See https://mlflow.org/docs/latest/tracking.html#artifact-stores

  • mlflow_experiment_name (str, default=None) –

    Experiment name for MLflow. This argument is passed to name in mlflow.create_experiment()

    See https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment

  • tuning_kws (dict[str, dict], default=None) –

    Additional parameters passed to tuning instances. Keys should be members of learning_algos argument. Values should be dict of parameters passed to tuning instances, e.g. {‘not_opt_params’: {‘’kernel’: ‘rbf’}}.

    See API Reference of tuning instances.

Returns

df_result – Validation scores of before and after tuning model.

Return type

pd.DataFrame

print_estimator(learner_name, printed_name, mlflow_logging=False)

Print estimator after tuning

Parameters

learner_name ({'linear_regression', 'elasticnet', 'svr', 'randomforest', 'lightgbm', 'xgboost', 'svm', 'logistic'}, or np.ndarray) – Printed learning algorithm name