seaborn_analyzer package

seaborn_analyzer.custom_hist_plot module

class seaborn_analyzer.custom_hist_plot.hist

Bases: object

classmethod fit_dist(data: pandas.core.frame.DataFrame, x: Optional[str] = None, hue=None, dist='norm', ax=None, binwidth=None, bins='auto', norm_hist=True, floc=None, sigmarange=4, linecolor='red', linesplit=200, hist_kws={})

Fit distributions by maximum likelihood estimation, and calculate fitting scores.

Parameters
  • data (pd.DataFrame, pd.Series, or pd.ndarray) – Input data structure. Either a long-form collection of vectors that can be assigned to named variables or a wide-form dataset that will be internally reshaped.

  • x (str, optional) – Variables that specify positions on the x. Available only if data is pd.DataFrame.

  • hue (str, pd.Series, or pd.ndarray, optional) – Semantic variable that is mapped to determine the color of plot elements. If data is pd.DataFrame, the argument must be key in data.

  • dist ({'norm', 'lognorm', 'gamma', 't', 'expon', 'uniform', 'chi2', 'weibull'} or list, optional) – Type of fitting distribution or list of distrbutions.

  • ax (matplotlib.axes.Axes, optional) – Pre-existing axes for the plot. Otherwise, call matplotlib.pyplot.gca() internally.

  • binwidth (float, optional) – Width of each bin, overrides bins.

  • bins (int, optional) – Generic bin parameter that can be the name of a reference rule, the number of bins, or the breaks of the bins. Passed to numpy.histogram_bin_edges().

  • norm_hist (bool, optional) – If True, the histogram height shows a density rather than a count.

  • floc (float, optional) – Hold location parameter fixed to specified value. If None, location parameter is fitted by maximum likelihood estimation except when dist is ‘weibull’ or expon’. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.fit.html#scipy.stats.rv_continuous.fit

  • sigmarange (float, optional) – Set the x-axis view limits. The lower limit is -sigmarange * std(data) + mean(data). The higher limit is sigmarange * std(data) + mean(data).

  • linecolor (str or List[str], optional) – Color of fitting line or colors of fitting lines. See https://matplotlib.org/stable/gallery/color/named_colors.html

  • linesplit (int, optional) – Number of fitting line divisions.

  • hist_kws (dict, optional) – Additional parameters passed to seaborn.histplot() other than the above arguments.

Returns

  • all_params (dict) – Parameters estimated by maximum likelihood estimation.

  • all_scores (dict) – Fitting scores, which consist of RSS, AIC, and BIC.

classmethod plot_normality(data: pandas.core.frame.DataFrame, x: Optional[str] = None, hue=None, binwidth=None, bins='auto', norm_hist=False, sigmarange=4, linesplit=200, rounddigit=5, hist_kws={}, subplot_kws={})

Plot normality test result and QQ plot.

Parameters
  • data (pd.DataFrame, pd.Series, or pd.ndarray) – Input data structure. Either a long-form collection of vectors that can be assigned to named variables or a wide-form dataset that will be internally reshaped.

  • x (str, optional) – Variables that specify positions on the x. Available only if data is pd.DataFrame.

  • hue (str, optional) – Semantic variable that is mapped to determine the color of plot elements. Available only if data is pd.DataFrame

  • binwidth (float, optional) – Width of each bin, overrides bins.

  • bins (int, optional) – Generic bin parameter that can be the name of a reference rule, the number of bins, or the breaks of the bins. Passed to numpy.histogram_bin_edges().

  • norm_hist (bool, optional) – If True, the histogram height shows a density rather than a count.

  • sigmarange (float, optional) – Set the x-axis view limits. The lower limit is -sigmarange * std(data) + mean(data). The higher limit is sigmarange * std(data) + mean(data).

  • linesplit (int, optional) – Number of fitting line divisions.

  • rounddigit (int, optional) – Round a number of score to a given precision in decimal digits.

  • hist_kws (dict, optional) – Additional parameters passed to seaborn.histplot() other than the above arguments.

  • subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g. figsize. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html

seaborn_analyzer.custom_pair_plot module

class seaborn_analyzer.custom_pair_plot.CustomPairPlot

Bases: object

pairanalyzer(df, hue=None, palette=None, vars=None, lowerkind='boxscatter', diag_kind='kde', markers=None, height=2.5, aspect=1, dropna=True, lower_kws={}, diag_kws={}, grid_kws={})

Plotting pair plot including scatter plot and correlation coefficient matrix simultaneously. This method mainly uses seaborn.PairGrid class.

Parameters
  • df (pd.DataFrame) – Input data structure. Int, float, and bool columns are displayed in the output graph.

  • hue (str) – Variable in data to map plot aspects to different colors.

  • palette (str or dict[str]) – Set of colors for mapping the hue variable. If a dict, keys should be values in the hue variable.

  • vars (list[str]) – Variables within data to use, otherwise use every column with a numeric datatype.

  • lowerkind ({'boxscatter', 'scatter', or 'reg'}) – Kind of plot for the lower triangular subplots.

  • diag_kind ({'kde' or 'hist'}) – Kind of plot for the diagonal subplots.

  • markers (str or list[str]) – Marker to use for all scatterplot points or a list of markers. See https://matplotlib.org/stable/api/markers_api.html

  • height (float) – Height (in inches) of each facet.

  • aspect (float) – Aspect * height gives the width (in inches) of each facet.

  • dropna (bool) – Drop missing values from the data before plotting.

  • lower_kws (dict) – Additional parameters passed to seaborn.PairGrid.map_lower(). If lowerkind is ‘scatter’, the arguments are applied to seaborn.scatterplot method of the lower subplots.

  • diag_kws (dict) – Additional parameters passed to seaborn.PairGrid.map_diag(). If lowerkind is ‘kde’, the arguments are applied to seaborn.kdeplot method of the diagonal subplots.

  • grid_kws (dict) – Additional parameters passed to seaborn.PairGrid.__init__() other than the above arguments. See https://seaborn.pydata.org/generated/seaborn.PairGrid.html

seaborn_analyzer.custom_class_plot module

class seaborn_analyzer.custom_class_plot.classplot

Bases: object

classmethod class_proba_plot(clf, x: List[str], y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colnames: Optional[List[str]] = None, x_chart: Optional[List[str]] = None, pair_sigmarange=1.0, pair_sigmainterval=0.5, chart_extendsigma=0.5, chart_scale=1, plot_border=True, plot_scatter='class', rounddigit_x3=2, proba_class=None, proba_cmap_dict=None, proba_type='contourf', scatter_colors=None, true_marker='o', false_marker='x', cv=None, cv_seed=42, cv_group=None, display_cv_indices=0, clf_params=None, fit_params=None, eval_set_selection=None, subplot_kws=None, contourf_kws=None, imshow_kws=None, scatter_kws=None, legend_kws=None)

Plot class prediction probability of any scikit-learn classifier with 2 to 4D explanatory variables.

Parameters
  • clf (classifier object implementing fit) – Classifier. This is assumed to implement the scikit-learn estimator interface.

  • x (list[str], or np.ndarray) – Explanatory variables. Should be list[str] if data is pd.DataFrame. Should be np.ndarray if data is None

  • y (str or np.ndarray) – Objective variable. Should be str if data is pd.DataFrame. Should be np.ndarray if data is None

  • data (pd.DataFrame, optional) – Input data structure.

  • x_colnames (list[str], optional) – Names of explanatory variables. Available only if data is NOT pd.DataFrame

  • x_chart (list[str], optional) – X-axis and y-axis variables of separation map. If None, use two variables in x from the front.

  • pair_sigmarange (float, optional) – Set the range of subplots. The lower limit is mean({x3, x4}) - pair_sigmarange * std({x3, x4}). The higher limit is mean({x3, x4}) + pair_sigmarange * std({x3, x4}). Available only if len(x) is bigger than 2.

  • pair_sigmainterval (float, optional) – Set the interval of subplots. For example, if pair_sigmainterval is set to 0.5 and pair_sigmarange is set to 1.0, The ranges of subplots are lower than μ-1σ, μ-1σ to μ-0.5σ, μ-0.5σ to μ, μ to μ+0.5σ, μ+0.5σ to μ+1σ, and higher than μ+1σ. Available only if len(x) is bigger than 2.

  • chart_extendsigma (float, optional) – Set the axis view limits of the separation map. The lower limit is min({x1, x2}) - std({x1, x2}) * chart_extendsigma. The higher limit is max({x1, x2}) + std({x1, x2}) * chart_extendsigma

  • chart_scale (int, optional) – Set the resolution of the separation lines. If plotting speed is slow, we reccomend setting chart_scale to 2. We DON’T reccomend setting it to larger than 3 because of jaggies.

  • plot_border (bool, optional) – If True, display class separation lines

  • plot_scatter ({'error', 'class', 'class_error', None}, optional) – Color decision of scatter plot. If ‘error’, to be mapped to colors using true-false. If ‘class’, to be mapped to colors using class labels. If ‘class_error’, to be mapped to colors using class labels and marker styles using true-false. If None, no scatter.

  • rounddigit_x3 (int, optional) – Round a number of y-axis valiable of subplots to a given precision in decimal digits.

  • proba_class (str or list[str], optional) – Class label name, in which probability map is displayed.

  • proba_cmap_dict (dict[str, str], optional) – Colormap of probability map. The keys must be class label name and the values must be colormap names in Matplotlib. See https://matplotlib.org/stable/tutorials/colors/colormaps.html

  • proba_type ({'contourf', 'contour', 'imshow'}, optional) – Plotting type of probabiliity map. If ‘contourf’, mapped by matplotlib.pyplot.contourf(). If ‘contour’, mapped by matplotlib.pyplot.contour(). If ‘imshow’, mapped by matplotlib.pyplot.imshow(). ‘imshow’ is available only if the number of class labels is less than 4.

  • scatter_colors (list[str], optional) – Set of colors for mapping the class labels. Available only if plot_scatter is set to ‘class’ or ‘class_error’.

  • true_marker (str, optional) – Marker style of True label. Available only if plot_scatter is set to ‘error’ or ‘class_error’. See https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers

  • false_marker (str, optional) – Marker style of False label. Available only if plot_scatter is set to ‘error’ or ‘class_error’. See https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers

  • cv (int or sklearn.model_selection.*, optional) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.

  • cv_seed (int, optional) – Seed for random number generator of cross validation.

  • cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to groups argument of cv.split().

  • display_cv_indices (int, optional) – Cross validation index or indices to display.

  • clf_params (dict, optional) – Parameters passed to the classifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • fit_params (dict, optional) – Parameters passed to the fit() method of the classifier, e.g. early_stopping_round and eval_set of XGBClassifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • eval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –

    Select data passed to eval_set in fit_params. Available only if “estimator” is LightGBM or XGBoost.

    If “all”, use all data in X and y.

    If “train”, select train data from X and y using cv.split().

    If “test”, select test data from X and y using cv.split().

    If “original”, use raw eval_set.

    If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.

  • subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g. figsize. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html

  • contourf_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.contourf() if proba_type is set to ‘contourf’, or additional parameters passed to matplotlib.pyplot.contour() if proba_type is set to ‘contour’. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.contourf.html or https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.contour.html

  • imshow_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.imshow(), e.g. alpha. Available only if proba_type is set to ‘imshow’. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html

  • scatter_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.scatter(), e.g. alpha. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html

  • legend_kws (dict) – Additional parameters passed to ax.legend(), e.g. loc. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html

classmethod class_separator_plot(clf, x: List[str], y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colnames: Optional[List[str]] = None, x_chart: Optional[List[str]] = None, pair_sigmarange=1.0, pair_sigmainterval=0.5, chart_extendsigma=0.5, chart_scale=1, plot_scatter='class_error', rounddigit_x3=2, scatter_colors=None, true_marker='o', false_marker='x', cv=None, cv_seed=42, cv_group=None, display_cv_indices=0, clf_params=None, fit_params=None, eval_set_selection=None, subplot_kws=None, contourf_kws=None, scatter_kws=None, legend_kws=None)

Plot class separation lines of any scikit-learn classifier with 2 to 4D explanatory variables.

Parameters
  • clf (classifier object implementing fit) – Classifier. This is assumed to implement the scikit-learn estimator interface.

  • x (list[str], or np.ndarray) – Explanatory variables. Should be list[str] if data is pd.DataFrame. Should be np.ndarray if data is None

  • y (str or np.ndarray) – Objective variable. Should be str if data is pd.DataFrame. Should be np.ndarray if data is None

  • data (pd.DataFrame) – Input data structure.

  • x_colnames (list[str], optional) – Names of explanatory variables. Available only if data is NOT pd.DataFrame

  • x_chart (list[str], optional) – X-axis . If None, use two variables in x from the front.

  • pair_sigmarange (float, optional) – Set the range of subplots. The lower limit is mean({x3, x4}) - pair_sigmarange * std({x3, x4}). The higher limit is mean({x3, x4}) + pair_sigmarange * std({x3, x4}). Available only if len(x) is bigger than 2.

  • pair_sigmainterval (float, optional) – Set the interval of subplots. For example, if pair_sigmainterval is set to 0.5 and pair_sigmarange is set to 1.0, The ranges of subplots are lower than μ-1σ, μ-1σ to μ-0.5σ, μ-0.5σ to μ, μ to μ+0.5σ, μ+0.5σ to μ+1σ, and higher than μ+1σ. Available only if len(x) is bigger than 2.

  • chart_extendsigma (float, optional) – Set the axis view limits of the separation map. The lower limit is min({x1, x2}) - std({x1, x2}) * chart_extendsigma. The higher limit is max({x1, x2}) + std({x1, x2}) * chart_extendsigma

  • chart_scale (int, optional) – Set the resolution of the separation lines. If plotting speed is slow, we reccomend setting chart_scale to 2. We DON’T reccomend setting it to larger than 3 because of jaggies.

  • plot_scatter ({'error', 'class', 'class_error', None}, optional) – Color decision of scatter plot. If ‘error’, to be mapped to colors using true-false. If ‘class’, to be mapped to colors using class labels. If ‘class_error’, to be mapped to colors using class labels and marker styles using true-false. If None, no scatter.

  • rounddigit_x3 (int, optional) – Round a number of y-axis valiable of subplots to a given precision in decimal digits.

  • scatter_colors (list[str], optional) – Set of colors for mapping the class labels. Available only if plot_scatter is set to ‘class’ or ‘class_error’.

  • true_marker (str, optional) – Marker style of True label. Available only if plot_scatter is set to ‘error’ or ‘class_error’. See https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers

  • false_marker (str, optional) – Marker style of False label. Available only if plot_scatter is set to ‘error’ or ‘class_error’. See https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers

  • cv (int, cross-validation generator, or an iterable, optional) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.

  • cv_seed (int, optional) – Seed for random number generator of cross validation.

  • cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to groups argument of cv.split().

  • display_cv_indices (int, optional) – Cross validation index or indices to display.

  • clf_params (dict, optional) – Parameters passed to the classifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • fit_params (dict, optional) – Parameters passed to the fit() method of the classifier, e.g. early_stopping_round and eval_set of XGBClassifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • eval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –

    Select data passed to eval_set in fit_params. Available only if “estimator” is LightGBM or XGBoost.

    If “all”, use all data in X and y.

    If “train”, select train data from X and y using cv.split().

    If “test”, select test data from X and y using cv.split().

    If “original”, use raw eval_set.

    If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.

  • subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g. figsize. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html

  • contourf_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.contourf(), e.g. alpha. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.contourf.html

  • scatter_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.scatter(), e.g. alpha. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html

  • legend_kws (dict) – Additional parameters passed to ax.legend(), e.g. loc. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html

classmethod plot_roc_curve_multiclass(estimator, X_train, y_train, *, X_test=None, y_test=None, sample_weight=None, drop_intermediate=True, response_method='predict_proba', name=None, ax=None, pos_label=None, average='macro', fit_params=None, plot_roc_kws=None, class_average_kws=None)

Plot Receiver operating characteristic (ROC) curve.

Available both multiclass and binary classification

Extra keyword arguments will be passed to matplotlib’s plot.

Parameters
  • estimator (estimator instance) – Fitted classifier or a fitted Pipeline in which the last estimator is a classifier.

  • X_train ({array-like, sparse matrix} of shape (n_samples, n_features)) – Input values of train data.

  • y_train (array-like of shape (n_samples,)) – Target values of train data.

  • X_test ({array-like, sparse matrix} of shape (n_samples, n_features)) – Input values of test data.

  • y_test (array-like of shape (n_samples,)) – Target values of test data.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

  • drop_intermediate (boolean, default=True) – Whether to drop some suboptimal thresholds which would not appear on a plotted ROC curve. This is useful in order to create lighter ROC curves.

  • response_method ({'predict_proba', 'decision_function'}, default='predict_proba') – Specifies whether to use for calcurating class probability.

  • name (str, default=None) – Name of ROC Curve for labeling. If None, use the name of the estimator.

  • ax (matplotlib axes, default=None) – Axes object to plot on. If None, a new figure and axes is created.

  • pos_label (str or int, default=None) – The class considered as the positive class when computing the roc auc metrics. By default, estimators.classes_[1] is considered as the positive class.

  • average ({'macro', 'micro'}, default='micro') – Specifies whether to use for calcurating average of tpr and fpr.

  • fit_params (dict, default=None) – Parameters passed to the fit() method of the classifier, e.g. early_stopping_round and eval_set of XGBClassifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • plot_roc_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.plot() that draws ROC curve of each classes, e.g. lw. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html

  • class_average_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.plot() or sklearn.metrics.plot_roc_curve() that draws ROC curve of average, e.g. alpha. See https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_roc_curve.html

Returns

display – Object that stores computed values.

Return type

RocCurveDisplay

classmethod roc_plot(clf, x: List[str], y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colnames: Optional[List[str]] = None, cv=None, cv_seed=42, cv_group=None, ax=None, sample_weight=None, drop_intermediate=True, response_method='predict_proba', pos_label=None, average='macro', clf_params=None, fit_params=None, eval_set_selection=None, draw_grid=True, grid_kws=None, subplot_kws=None, legend_kws=None, plot_roc_kws=None, class_average_kws=None, cv_mean_kws=None, chance_plot_kws=None)

Plot Receiver operating characteristic (ROC) curve with cross validation.

Available both binary and multiclass classifiction.

Extra keyword arguments will be passed to matplotlib’s plot.

Parameters
  • clf (classifier object implementing fit) – Fitted classifier or a fitted Pipeline in which the last estimator is a classifier.

  • x (list[str], or np.ndarray) – Explanatory variables. Should be list[str] if data is pd.DataFrame. Should be np.ndarray if data is None

  • y (str or np.ndarray) – Objective variable. Should be str if data is pd.DataFrame. Should be np.ndarray if data is None

  • data (pd.DataFrame, default=None) – Input data structure.

  • x_colnames (list[str], default=None) – Names of explanatory variables. Available only if data is NOT pd.DataFrame

  • cv (int or sklearn.model_selection.*, default=5) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.

  • cv_seed (int, default=42) – Seed for random number generator of cross validation.

  • cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to groups argument of cv.split().

  • ax ({matplotlib.axes.Axes, list[matplotlib.axes.Axes]}, default=None) – Pre-existing axes for the plot or list of it. Otherwise, call matplotlib.pyplot.subplot() internally.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

  • drop_intermediate (boolean, default=True) – Whether to drop some suboptimal thresholds which would not appear on a plotted ROC curve. This is useful in order to create lighter ROC curves.

  • response_method ({'predict_proba', 'decision_function'}, default='predict_proba') – Specifies whether to use for calcurating class probability.

  • pos_label (str or int, default=None) – The class considered as the positive class when computing the roc auc metrics. By default, estimators.classes_[1] is considered as the positive class.

  • average ({'macro', 'micro'}, default='micro') – Specifies whether to use for calcurating average of tpr and fpr.

  • clf_params (dict, default=None) – Parameters passed to the classifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • fit_params (dict, default=None) – Parameters passed to the fit() method of the classifier, e.g. early_stopping_round and eval_set of XGBClassifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • eval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –

    Select data passed to eval_set in fit_params. Available only if “estimator” is LightGBM or XGBoost.

    If “all”, use all data in X and y.

    If “train”, select train data from X and y using cv.split().

    If “test”, select test data from X and y using cv.split().

    If “original”, use raw eval_set.

    If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.

  • draw_grid (bool, default=True) – If True, grid lines are drawn.

  • grid_kws (dict, default=None) – Additional parameters passed to matplotlib.pyplot.grid() that draws grid lines, e.g. color. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.grid.html

  • subplot_kws (dict, default=None) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g. figsize. Avealable only if ax is None. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html

  • legend_kws (dict) – Additional parameters passed to ax.legend(), e.g. loc. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html

  • plot_roc_kws (dict, default=None) – Additional parameters passed to matplotlib.pyplot.plot() that draws ROC curve of each classes, e.g. lw. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html

  • class_average_kws (dict, default=None) – Additional parameters passed to matplotlib.pyplot.plot() or sklearn.metrics.plot_roc_curve() that draws average ROC curve of all classes, e.g. lw. See https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_roc_curve.html

  • cv_mean_kws (dict, default=None) – Additional parameters passed to matplotlib.pyplot.plot() that draws mean ROC curve of all folds of cross validation, e.g. lw. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html

  • chance_plot_kws (dict, default=None) – Additional parameters passed to matplotlib.pyplot.plot() that draws chance line, e.g. lw. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html

seaborn_analyzer.custom_reg_plot module

class seaborn_analyzer.custom_reg_plot.regplot

Bases: object

classmethod average_plot(estimator, x: List[str], y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colnames: Optional[List[str]] = None, hue=None, aggregate='mean', cv=None, cv_seed=42, cv_group=None, display_cv_indices=0, estimator_params=None, fit_params=None, eval_set_selection=None, subplot_kws=None, plot_kws=None, scatter_kws=None, legend_kws=None)

Plot relationship between one explanatory variable and predicted value by line graph.

Other explanatory variables are fixed to aggregated values such as mean values or median values.

Parameters
  • estimator (estimator object implementing fit) – Regression estimator. This is assumed to implement the scikit-learn estimator interface.

  • x (list[str] or np.ndarray) – Explanatory variables. Should be list[str] if data is pd.DataFrame. Should be np.ndarray if data is None

  • y (str or np.ndarray) – Objective variable. Should be str if data is pd.DataFrame. Should be np.ndarray if data is None

  • data (pd.DataFrame) – Input data structure.

  • x_colnames (list[str], optional) – Names of explanatory variables. Available only if data is NOT pd.DataFrame

  • hue (str, optional) – Grouping variable that will produce points with different colors.

  • aggregate ({'mean', 'median'}, optional) – Statistic method of aggregating explanatory variables except x_axis variable.

  • cv (int, cross-validation generator, or an iterable, optional) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.

  • cv_seed (int, optional) – Seed for random number generator of cross validation.

  • cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to groups argument of cv.split().

  • display_cv_indices (int or list, optional) – Cross validation index or indices to display.

  • estimator_params (dict, optional) – Parameters passed to the regression estimator. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • fit_params (dict, optional) – Parameters passed to the fit() method of the regression estimator, e.g. early_stopping_round and eval_set of XGBRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • eval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –

    Select data passed to eval_set in fit_params. Available only if estimator is LightGBM or XGBoost and cv is not None.

    If “all”, use all data in X and y.

    If “train”, select train data from X and y using cv.split().

    If “test”, select test data from X and y using cv.split().

    If “original”, use raw eval_set.

    If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.

  • subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g. figsize. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html

  • plot_kws (dict, optional) – Additional parameters passed to matplotlib.axes.Axes.plot(), e.g. alpha. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.plot.html

  • scatter_kws (dict, optional) – Additional parameters passed to seaborn.scatterplot(), e.g. alpha. See https://seaborn.pydata.org/generated/seaborn.scatterplot.html

  • legend_kws (dict) – Additional parameters passed to matplotlib.axes.Axes.legend(), e.g. loc. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html

classmethod linear_plot(x: str, y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colname: Optional[str] = None, ax=None, hue=None, linecolor='red', rounddigit=5, plot_scores=True, scatter_kws=None, legend_kws=None)

Plot linear regression line and calculate Pearson correlation coefficient.

Parameters
  • x (str) – Variable that specify positions on the x.

  • y (str) – Variable that specify positions on the y.

  • data (pd.DataFrame) – Input data structure.

  • x_colname (str, optional) – Names of explanatory variable. Available only if data is NOT pd.DataFrame

  • ax (matplotlib.axes.Axes, optional) – Pre-existing axes for the plot. Otherwise, call matplotlib.pyplot.gca() internally.

  • hue (str, optional) – Grouping variable that will produce points with different colors.

  • linecolor (str, optional) – Color of regression line. See https://matplotlib.org/stable/gallery/color/named_colors.html

  • rounddigit (int, optional) – Round a number of score to a given precision in decimal digits.

  • plot_scores (bool, optional) – If True, display Pearson correlation coefficient and the p-value.

  • scatter_kws (dict, optional) – Additional parameters passed to sns.scatterplot(), e.g. alpha. See https://seaborn.pydata.org/generated/seaborn.scatterplot.html

  • legend_kws (dict) – Additional parameters passed to ax.legend(), e.g. loc. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html

Returns

ax – Returns the Axes object with the plot drawn onto it.

Return type

matplotlib.axes.Axes

classmethod regression_heat_plot(estimator, x: List[str], y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colnames: Optional[List[str]] = None, x_heat: Optional[List[str]] = None, scatter_hue=None, pair_sigmarange=1.5, pair_sigmainterval=0.5, heat_extendsigma=0.5, heat_division=30, color_extendsigma=0.5, plot_scatter='true', rounddigit_rank=3, rounddigit_x1=2, rounddigit_x2=2, rounddigit_x3=2, rank_number=None, rank_col=None, cv=None, cv_seed=42, cv_group=None, display_cv_indices=0, estimator_params=None, fit_params=None, eval_set_selection=None, subplot_kws=None, heat_kws=None, scatter_kws=None, legend_kws=None)

Plot regression heatmaps of any scikit-learn regressor with 2 to 4D explanatory variables.

Parameters
  • estimator (estimator object implementing fit) – Regression estimator. This is assumed to implement the scikit-learn estimator interface.

  • x (list[str] or np.ndarray) – Explanatory variables. Should be list[str] if data is pd.DataFrame. Should be np.ndarray if data is None

  • y (str or np.ndarray) – Objective variable. Should be str if data is pd.DataFrame. Should be np.ndarray if data is None

  • data (pd.DataFrame) – Input data structure.

  • x_colnames (list[str], optional) – Names of explanatory variables. Available only if data is NOT pd.DataFrame

  • x_heat (list[str], optional) – X-axis and y-axis variables of heatmap. If None, use two variables in x from the front.

  • scatter_hue (str, optional) – Grouping variable that will produce points with different colors. Available only if plot_scatter is set to hue.

  • pair_sigmarange (float, optional) – Set the range of subplots. The lower limit is mean({x3, x4}) - pair_sigmarange * std({x3, x4}). The higher limit is mean({x3, x4}) + pair_sigmarange * std({x3, x4}). Available only if len(x) is bigger than 2.

  • pair_sigmainterval (float, optional) – Set the interval of subplots. For example, if pair_sigmainterval is set to 0.5 and pair_sigmarange is set to 1.0, The ranges of subplots are lower than μ-1σ, μ-1σ to μ-0.5σ, μ-0.5σ to μ, μ to μ+0.5σ, μ+0.5σ to μ+1σ, and higher than μ+1σ. Available only if len(x) is bigger than 2.

  • heat_extendsigma (float, optional) – Set the axis view limits of the heatmap. The lower limit is min({x1, x2}) - std({x1, x2}) * heat_extendsigma. The higher limit is max({x1, x2}) + std({x1, x2}) * heat_extendsigma

  • heat_division (int, optional) – Resolution of the heatmap.

  • color_extendsigma (float, optional) – Set the colormap limits of the heatmap. The lower limit is min(y_ture) - std(y_ture) * color_extendsigma. The higher limit is max(y_ture) - std(y_ture) * color_extendsigma.

  • plot_scatter ({'error', 'true', 'hue'}, optional) – Color decision of scatter plot. If ‘error’, to be mapped to colors using error value. If ‘true’, to be mapped to colors using y_ture value. If ‘hue’, to be mapped to colors using scatter_hue variable. If None, no scatter.

  • rounddigit_rank (int, optional) – Round a number of error that are in the top posiotions for regression error to a given precision in decimal digits.

  • rounddigit_x1 (int, optional) – Round a number of x-axis valiable of the heatmap to a given precision in decimal digits.

  • rounddigit_x2 (int, optional) – Round a number of y-axis valiable of the heatmap to a given precision in decimal digits.

  • rounddigit_x3 (int, optional) – Round a number of y-axis valiable of subplots to a given precision in decimal digits.

  • rank_number (int, optional) – Number of emphasized data that are in the top posiotions for regression error.

  • rank_col (str, optional) – Variables that are displayed with emphasized data that are in the top posiotions for regression error.

  • cv (int, cross-validation generator, or an iterable, optional) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.

  • cv_seed (int, optional) – Seed for random number generator of cross validation.

  • cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to groups argument of cv.split().

  • display_cv_indices (int or list, optional) – Cross validation index or indices to display.

  • estimator_params (dict, optional) – Parameters passed to the regression estimator. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • fit_params (dict, optional) – Parameters passed to the fit() method of the regression estimator, e.g. early_stopping_round and eval_set of XGBRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • eval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –

    Select data passed to eval_set in fit_params. Available only if “estimator” is LightGBM or XGBoost.

    If “all”, use all data in X and y.

    If “train”, select train data from X and y using cv.split().

    If “test”, select test data from X and y using cv.split().

    If “original”, use raw eval_set.

    If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.

  • subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g. figsize. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html

  • heat_kws (dict, optional) – Additional parameters passed to sns.heatmap(), e.g. cmap. See https://seaborn.pydata.org/generated/seaborn.heatmap.html

  • scatter_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.scatter(), e.g. alpha. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html

  • legend_kws (dict) – Additional parameters passed to ax.legend(), e.g. loc. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html

classmethod regression_plot_1d(estimator, x: str, y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colname: Optional[str] = None, hue=None, linecolor='red', rounddigit=3, rank_number=None, rank_col=None, scores='mae', cv_stats='mean', cv=None, cv_seed=42, cv_group=None, estimator_params=None, fit_params=None, eval_set_selection=None, subplot_kws=None, scatter_kws=None, legend_kws=None)

Plot regression lines of any scikit-learn regressor with 1D explanatory variable.

Parameters
  • estimator (estimator object implementing fit) – Regression estimator. This is assumed to implement the scikit-learn estimator interface.

  • x (str, or np.ndarray) – Explanatory variables. Should be str if data is pd.DataFrame. Should be np.ndarray if data is None

  • y (str or np.ndarray) – Objective variable. Should be str if data is pd.DataFrame. Should be np.ndarray if data is None

  • data (pd.DataFrame) – Input data structure.

  • x_colname (str, optional) – Names of explanatory variable. Available only if data is NOT pd.DataFrame

  • hue (str, optional) – Grouping variable that will produce points with different colors.

  • linecolor (str, optional) – Color of prediction = true line. See https://matplotlib.org/stable/gallery/color/named_colors.html

  • rounddigit (int, optional) – Round a number of score to a given precision in decimal digits.

  • rank_number (int, optional) – Number of emphasized data that are in the top positions for regression error.

  • rank_col (list[str], optional) – Variables that are displayed with emphasized data that are in the top posiotions for regression error.

  • scores ({'r2', 'mae', 'mse', 'rmse', 'rmsle', 'mape', 'max_error'} or list,, optional) – Regression score that are displayed at the lower right of the graph.

  • cv_stats ({'mean', 'median', 'max', 'min'}, optional) – Statistical method of cross validation score that are displayed at the lower right of the graph.

  • cv (int, cross-validation generator, or an iterable, optional) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.

  • cv_seed (int, optional) – Seed for random number generator of cross validation.

  • cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to groups argument of cv.split().

  • estimator_params (dict, optional) – Parameters passed to the regression estimator. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • fit_params (dict, optional) – Parameters passed to the fit() method of the regression estimator, e.g. early_stopping_round and eval_set of XGBRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g. figsize. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html

  • eval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –

    Select data passed to eval_set in fit_params. Available only if “estimator” is LightGBM or XGBoost.

    If “all”, use all data in X and y.

    If “train”, select train data from X and y using cv.split().

    If “test”, select test data from X and y using cv.split().

    If “original”, use raw eval_set.

    If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.

  • scatter_kws (dict, optional) – Additional parameters passed to sns.scatterplot(), e.g. alpha. See https://seaborn.pydata.org/generated/seaborn.scatterplot.html

  • legend_kws (dict) – Additional parameters passed to ax.legend(), e.g. loc. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html

Returns

score_dict – Validation scores, e.g. r2, mae and rmse

Return type

dict

classmethod regression_pred_true(estimator, x: List[str], y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colnames: Optional[List[str]] = None, hue=None, linecolor='red', rounddigit=3, rank_number=None, rank_col=None, scores='mae', cv_stats='mean', cv=None, cv_seed=42, cv_group=None, ax=None, estimator_params=None, fit_params=None, eval_set_selection=None, subplot_kws=None, scatter_kws=None, legend_kws=None)

Plot prediction vs. true scatter plots of any scikit-learn regression estimator

Parameters
  • estimator (estimator object implementing fit) – Regression estimator. This is assumed to implement the scikit-learn estimator interface.

  • x (str or list[str]) – Explanatory variables.

  • y (str) – Objective variable.

  • data (pd.DataFrame) – Input data structure.

  • x_colnames (list[str], optional) – Names of explanatory variables. Available only if data is NOT pd.DataFrame

  • hue (str, optional) – Grouping variable that will produce points with different colors.

  • linecolor (str, optional) – Color of prediction = true line. See https://matplotlib.org/stable/gallery/color/named_colors.html

  • rounddigit (int, optional) – Round a number of score to a given precision in decimal digits.

  • rank_number (int, optional) – Number of emphasized data that are in the top posiotions for regression error.

  • rank_col (list[str], optional) – Variables that are displayed with emphasized data that are in the top posiotions for regression error.

  • scores ({'r2', 'mae', 'mse', 'rmse', 'rmsle', 'mape', 'max_error'} or list, optional) – Regression score that are displayed at the lower right of the graph.

  • cv_stats ({'mean', 'median', 'max', 'min'}, optional) – Statistical method of cross validation score that are displayed at the lower right of the graph.

  • cv (int, cross-validation generator, or an iterable, optional) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.

  • cv_seed (int, optional) – Seed for random number generator of cross validation.

  • cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to groups argument of cv.split().

  • ax ({matplotlib.axes.Axes, list[matplotlib.axes.Axes]}, optional) – Pre-existing axes for the plot or list of it. Otherwise, call matplotlib.pyplot.subplot() internally.

  • estimator_params (dict, optional) – Parameters passed to the regression estimator. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • fit_params (dict, optional) – Parameters passed to the fit() method of the regression estimator, e.g. early_stopping_round and eval_set of XGBRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.

  • eval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –

    Select data passed to eval_set in fit_params. Available only if estimator is LightGBM or XGBoost and cv is not None.

    If “all”, use all data in X and y.

    If “train”, select train data from X and y using cv.split().

    If “test”, select test data from X and y using cv.split().

    If “original”, use raw eval_set.

    If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.

  • subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g. figsize. Available only if axes is None. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html

  • scatter_kws (dict, optional) – Additional parameters passed to sns.scatterplot(), e.g. alpha. See https://seaborn.pydata.org/generated/seaborn.scatterplot.html

  • legend_kws (dict) – Additional parameters passed to ax.legend(), e.g. loc. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html

Returns

score_dict – Validation scores, e.g. r2, mae and rmse

Return type

dict