seaborn_analyzer package¶
seaborn_analyzer.custom_hist_plot module¶
- class seaborn_analyzer.custom_hist_plot.hist¶
Bases:
object
- classmethod fit_dist(data: pandas.core.frame.DataFrame, x: Optional[str] = None, hue=None, dist='norm', ax=None, binwidth=None, bins='auto', norm_hist=True, floc=None, sigmarange=4, linecolor='red', linesplit=200, hist_kws={})¶
Fit distributions by maximum likelihood estimation, and calculate fitting scores.
- Parameters
data (pd.DataFrame, pd.Series, or pd.ndarray) – Input data structure. Either a long-form collection of vectors that can be assigned to named variables or a wide-form dataset that will be internally reshaped.
x (str, optional) – Variables that specify positions on the x. Available only if data is pd.DataFrame.
hue (str, pd.Series, or pd.ndarray, optional) – Semantic variable that is mapped to determine the color of plot elements. If
data
is pd.DataFrame, the argument must be key in data.dist ({'norm', 'lognorm', 'gamma', 't', 'expon', 'uniform', 'chi2', 'weibull'} or list, optional) – Type of fitting distribution or list of distrbutions.
ax (matplotlib.axes.Axes, optional) – Pre-existing axes for the plot. Otherwise, call matplotlib.pyplot.gca() internally.
binwidth (float, optional) – Width of each bin, overrides
bins
.bins (int, optional) – Generic bin parameter that can be the name of a reference rule, the number of bins, or the breaks of the bins. Passed to numpy.histogram_bin_edges().
norm_hist (bool, optional) – If True, the histogram height shows a density rather than a count.
floc (float, optional) – Hold location parameter fixed to specified value. If None, location parameter is fitted by maximum likelihood estimation except when
dist
is ‘weibull’ or expon’. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.fit.html#scipy.stats.rv_continuous.fitsigmarange (float, optional) – Set the x-axis view limits. The lower limit is -sigmarange * std(data) + mean(data). The higher limit is sigmarange * std(data) + mean(data).
linecolor (str or List[str], optional) – Color of fitting line or colors of fitting lines. See https://matplotlib.org/stable/gallery/color/named_colors.html
linesplit (int, optional) – Number of fitting line divisions.
hist_kws (dict, optional) – Additional parameters passed to seaborn.histplot() other than the above arguments.
- Returns
all_params (dict) – Parameters estimated by maximum likelihood estimation.
all_scores (dict) – Fitting scores, which consist of RSS, AIC, and BIC.
- classmethod plot_normality(data: pandas.core.frame.DataFrame, x: Optional[str] = None, hue=None, binwidth=None, bins='auto', norm_hist=False, sigmarange=4, linesplit=200, rounddigit=5, hist_kws={}, subplot_kws={})¶
Plot normality test result and QQ plot.
- Parameters
data (pd.DataFrame, pd.Series, or pd.ndarray) – Input data structure. Either a long-form collection of vectors that can be assigned to named variables or a wide-form dataset that will be internally reshaped.
x (str, optional) – Variables that specify positions on the x. Available only if data is pd.DataFrame.
hue (str, optional) – Semantic variable that is mapped to determine the color of plot elements. Available only if
data
is pd.DataFramebinwidth (float, optional) – Width of each bin, overrides
bins
.bins (int, optional) – Generic bin parameter that can be the name of a reference rule, the number of bins, or the breaks of the bins. Passed to numpy.histogram_bin_edges().
norm_hist (bool, optional) – If True, the histogram height shows a density rather than a count.
sigmarange (float, optional) – Set the x-axis view limits. The lower limit is -sigmarange * std(data) + mean(data). The higher limit is sigmarange * std(data) + mean(data).
linesplit (int, optional) – Number of fitting line divisions.
rounddigit (int, optional) – Round a number of score to a given precision in decimal digits.
hist_kws (dict, optional) – Additional parameters passed to seaborn.histplot() other than the above arguments.
subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g. figsize. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html
seaborn_analyzer.custom_pair_plot module¶
- class seaborn_analyzer.custom_pair_plot.CustomPairPlot¶
Bases:
object
- pairanalyzer(df, hue=None, palette=None, vars=None, lowerkind='boxscatter', diag_kind='kde', markers=None, height=2.5, aspect=1, dropna=True, lower_kws={}, diag_kws={}, grid_kws={})¶
Plotting pair plot including scatter plot and correlation coefficient matrix simultaneously. This method mainly uses seaborn.PairGrid class.
- Parameters
df (pd.DataFrame) – Input data structure. Int, float, and bool columns are displayed in the output graph.
hue (str) – Variable in data to map plot aspects to different colors.
palette (str or dict[str]) – Set of colors for mapping the hue variable. If a dict, keys should be values in the hue variable.
vars (list[str]) – Variables within data to use, otherwise use every column with a numeric datatype.
lowerkind ({'boxscatter', 'scatter', or 'reg'}) – Kind of plot for the lower triangular subplots.
diag_kind ({'kde' or 'hist'}) – Kind of plot for the diagonal subplots.
markers (str or list[str]) – Marker to use for all scatterplot points or a list of markers. See https://matplotlib.org/stable/api/markers_api.html
height (float) – Height (in inches) of each facet.
aspect (float) – Aspect * height gives the width (in inches) of each facet.
dropna (bool) – Drop missing values from the data before plotting.
lower_kws (dict) – Additional parameters passed to seaborn.PairGrid.map_lower(). If
lowerkind
is ‘scatter’, the arguments are applied to seaborn.scatterplot method of the lower subplots.diag_kws (dict) – Additional parameters passed to seaborn.PairGrid.map_diag(). If
lowerkind
is ‘kde’, the arguments are applied to seaborn.kdeplot method of the diagonal subplots.grid_kws (dict) – Additional parameters passed to seaborn.PairGrid.__init__() other than the above arguments. See https://seaborn.pydata.org/generated/seaborn.PairGrid.html
seaborn_analyzer.custom_class_plot module¶
- class seaborn_analyzer.custom_class_plot.classplot¶
Bases:
object
- classmethod class_proba_plot(clf, x: List[str], y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colnames: Optional[List[str]] = None, x_chart: Optional[List[str]] = None, pair_sigmarange=1.0, pair_sigmainterval=0.5, chart_extendsigma=0.5, chart_scale=1, plot_border=True, plot_scatter='class', rounddigit_x3=2, proba_class=None, proba_cmap_dict=None, proba_type='contourf', scatter_colors=None, true_marker='o', false_marker='x', cv=None, cv_seed=42, cv_group=None, display_cv_indices=0, clf_params=None, fit_params=None, eval_set_selection=None, subplot_kws=None, contourf_kws=None, imshow_kws=None, scatter_kws=None, legend_kws=None)¶
Plot class prediction probability of any scikit-learn classifier with 2 to 4D explanatory variables.
- Parameters
clf (classifier object implementing
fit
) – Classifier. This is assumed to implement the scikit-learn estimator interface.x (list[str], or np.ndarray) – Explanatory variables. Should be list[str] if
data
is pd.DataFrame. Should be np.ndarray ifdata
is Noney (str or np.ndarray) – Objective variable. Should be str if
data
is pd.DataFrame. Should be np.ndarray ifdata
is Nonedata (pd.DataFrame, optional) – Input data structure.
x_colnames (list[str], optional) – Names of explanatory variables. Available only if
data
is NOT pd.DataFramex_chart (list[str], optional) – X-axis and y-axis variables of separation map. If None, use two variables in
x
from the front.pair_sigmarange (float, optional) – Set the range of subplots. The lower limit is mean({x3, x4}) -
pair_sigmarange
* std({x3, x4}). The higher limit is mean({x3, x4}) +pair_sigmarange
* std({x3, x4}). Available only if len(x) is bigger than 2.pair_sigmainterval (float, optional) – Set the interval of subplots. For example, if
pair_sigmainterval
is set to 0.5 andpair_sigmarange
is set to 1.0, The ranges of subplots are lower than μ-1σ, μ-1σ to μ-0.5σ, μ-0.5σ to μ, μ to μ+0.5σ, μ+0.5σ to μ+1σ, and higher than μ+1σ. Available only if len(x) is bigger than 2.chart_extendsigma (float, optional) – Set the axis view limits of the separation map. The lower limit is min({x1, x2}) - std({x1, x2}) *
chart_extendsigma
. The higher limit is max({x1, x2}) + std({x1, x2}) *chart_extendsigma
chart_scale (int, optional) – Set the resolution of the separation lines. If plotting speed is slow, we reccomend setting chart_scale to 2. We DON’T reccomend setting it to larger than 3 because of jaggies.
plot_border (bool, optional) – If True, display class separation lines
plot_scatter ({'error', 'class', 'class_error', None}, optional) – Color decision of scatter plot. If ‘error’, to be mapped to colors using true-false. If ‘class’, to be mapped to colors using class labels. If ‘class_error’, to be mapped to colors using class labels and marker styles using true-false. If None, no scatter.
rounddigit_x3 (int, optional) – Round a number of y-axis valiable of subplots to a given precision in decimal digits.
proba_class (str or list[str], optional) – Class label name, in which probability map is displayed.
proba_cmap_dict (dict[str, str], optional) – Colormap of probability map. The keys must be class label name and the values must be colormap names in Matplotlib. See https://matplotlib.org/stable/tutorials/colors/colormaps.html
proba_type ({'contourf', 'contour', 'imshow'}, optional) – Plotting type of probabiliity map. If ‘contourf’, mapped by matplotlib.pyplot.contourf(). If ‘contour’, mapped by matplotlib.pyplot.contour(). If ‘imshow’, mapped by matplotlib.pyplot.imshow(). ‘imshow’ is available only if the number of class labels is less than 4.
scatter_colors (list[str], optional) – Set of colors for mapping the class labels. Available only if
plot_scatter
is set to ‘class’ or ‘class_error’.true_marker (str, optional) – Marker style of True label. Available only if
plot_scatter
is set to ‘error’ or ‘class_error’. See https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markersfalse_marker (str, optional) – Marker style of False label. Available only if
plot_scatter
is set to ‘error’ or ‘class_error’. See https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markerscv (int or sklearn.model_selection.*, optional) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.
cv_seed (int, optional) – Seed for random number generator of cross validation.
cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to
groups
argument of cv.split().display_cv_indices (int, optional) – Cross validation index or indices to display.
clf_params (dict, optional) – Parameters passed to the classifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.
fit_params (dict, optional) – Parameters passed to the fit() method of the classifier, e.g.
early_stopping_round
andeval_set
of XGBClassifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.eval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –
Select data passed to eval_set in fit_params. Available only if “estimator” is LightGBM or XGBoost.
If “all”, use all data in X and y.
If “train”, select train data from X and y using cv.split().
If “test”, select test data from X and y using cv.split().
If “original”, use raw eval_set.
If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.
subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g.
figsize
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.htmlcontourf_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.contourf() if proba_type is set to ‘contourf’, or additional parameters passed to matplotlib.pyplot.contour() if proba_type is set to ‘contour’. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.contourf.html or https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.contour.html
imshow_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.imshow(), e.g.
alpha
. Available only if proba_type is set to ‘imshow’. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.htmlscatter_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.scatter(), e.g.
alpha
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.htmllegend_kws (dict) – Additional parameters passed to ax.legend(), e.g.
loc
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html
- classmethod class_separator_plot(clf, x: List[str], y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colnames: Optional[List[str]] = None, x_chart: Optional[List[str]] = None, pair_sigmarange=1.0, pair_sigmainterval=0.5, chart_extendsigma=0.5, chart_scale=1, plot_scatter='class_error', rounddigit_x3=2, scatter_colors=None, true_marker='o', false_marker='x', cv=None, cv_seed=42, cv_group=None, display_cv_indices=0, clf_params=None, fit_params=None, eval_set_selection=None, subplot_kws=None, contourf_kws=None, scatter_kws=None, legend_kws=None)¶
Plot class separation lines of any scikit-learn classifier with 2 to 4D explanatory variables.
- Parameters
clf (classifier object implementing
fit
) – Classifier. This is assumed to implement the scikit-learn estimator interface.x (list[str], or np.ndarray) – Explanatory variables. Should be list[str] if
data
is pd.DataFrame. Should be np.ndarray ifdata
is Noney (str or np.ndarray) – Objective variable. Should be str if
data
is pd.DataFrame. Should be np.ndarray ifdata
is Nonedata (pd.DataFrame) – Input data structure.
x_colnames (list[str], optional) – Names of explanatory variables. Available only if
data
is NOT pd.DataFramex_chart (list[str], optional) – X-axis . If None, use two variables in
x
from the front.pair_sigmarange (float, optional) – Set the range of subplots. The lower limit is mean({x3, x4}) -
pair_sigmarange
* std({x3, x4}). The higher limit is mean({x3, x4}) +pair_sigmarange
* std({x3, x4}). Available only if len(x) is bigger than 2.pair_sigmainterval (float, optional) – Set the interval of subplots. For example, if
pair_sigmainterval
is set to 0.5 andpair_sigmarange
is set to 1.0, The ranges of subplots are lower than μ-1σ, μ-1σ to μ-0.5σ, μ-0.5σ to μ, μ to μ+0.5σ, μ+0.5σ to μ+1σ, and higher than μ+1σ. Available only if len(x) is bigger than 2.chart_extendsigma (float, optional) – Set the axis view limits of the separation map. The lower limit is min({x1, x2}) - std({x1, x2}) * chart_extendsigma. The higher limit is max({x1, x2}) + std({x1, x2}) * chart_extendsigma
chart_scale (int, optional) – Set the resolution of the separation lines. If plotting speed is slow, we reccomend setting chart_scale to 2. We DON’T reccomend setting it to larger than 3 because of jaggies.
plot_scatter ({'error', 'class', 'class_error', None}, optional) – Color decision of scatter plot. If ‘error’, to be mapped to colors using true-false. If ‘class’, to be mapped to colors using class labels. If ‘class_error’, to be mapped to colors using class labels and marker styles using true-false. If None, no scatter.
rounddigit_x3 (int, optional) – Round a number of y-axis valiable of subplots to a given precision in decimal digits.
scatter_colors (list[str], optional) – Set of colors for mapping the class labels. Available only if
plot_scatter
is set to ‘class’ or ‘class_error’.true_marker (str, optional) – Marker style of True label. Available only if
plot_scatter
is set to ‘error’ or ‘class_error’. See https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markersfalse_marker (str, optional) – Marker style of False label. Available only if
plot_scatter
is set to ‘error’ or ‘class_error’. See https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markerscv (int, cross-validation generator, or an iterable, optional) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.
cv_seed (int, optional) – Seed for random number generator of cross validation.
cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to
groups
argument of cv.split().display_cv_indices (int, optional) – Cross validation index or indices to display.
clf_params (dict, optional) – Parameters passed to the classifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.
fit_params (dict, optional) – Parameters passed to the fit() method of the classifier, e.g.
early_stopping_round
andeval_set
of XGBClassifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.eval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –
Select data passed to eval_set in fit_params. Available only if “estimator” is LightGBM or XGBoost.
If “all”, use all data in X and y.
If “train”, select train data from X and y using cv.split().
If “test”, select test data from X and y using cv.split().
If “original”, use raw eval_set.
If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.
subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g.
figsize.
See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.htmlcontourf_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.contourf(), e.g.
alpha
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.contourf.htmlscatter_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.scatter(), e.g.
alpha
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.htmllegend_kws (dict) – Additional parameters passed to ax.legend(), e.g.
loc
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html
- classmethod plot_roc_curve_multiclass(estimator, X_train, y_train, *, X_test=None, y_test=None, sample_weight=None, drop_intermediate=True, response_method='predict_proba', name=None, ax=None, pos_label=None, average='macro', fit_params=None, plot_roc_kws=None, class_average_kws=None)¶
Plot Receiver operating characteristic (ROC) curve.
Available both multiclass and binary classification
Extra keyword arguments will be passed to matplotlib’s plot.
- Parameters
estimator (estimator instance) – Fitted classifier or a fitted
Pipeline
in which the last estimator is a classifier.X_train ({array-like, sparse matrix} of shape (n_samples, n_features)) – Input values of train data.
y_train (array-like of shape (n_samples,)) – Target values of train data.
X_test ({array-like, sparse matrix} of shape (n_samples, n_features)) – Input values of test data.
y_test (array-like of shape (n_samples,)) – Target values of test data.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
drop_intermediate (boolean, default=True) – Whether to drop some suboptimal thresholds which would not appear on a plotted ROC curve. This is useful in order to create lighter ROC curves.
response_method ({'predict_proba', 'decision_function'}, default='predict_proba') – Specifies whether to use for calcurating class probability.
name (str, default=None) – Name of ROC Curve for labeling. If None, use the name of the estimator.
ax (matplotlib axes, default=None) – Axes object to plot on. If None, a new figure and axes is created.
pos_label (str or int, default=None) – The class considered as the positive class when computing the roc auc metrics. By default, estimators.classes_[1] is considered as the positive class.
average ({'macro', 'micro'}, default='micro') – Specifies whether to use for calcurating average of tpr and fpr.
fit_params (dict, default=None) – Parameters passed to the fit() method of the classifier, e.g.
early_stopping_round
andeval_set
of XGBClassifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.plot_roc_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.plot() that draws ROC curve of each classes, e.g.
lw
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.htmlclass_average_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.plot() or sklearn.metrics.plot_roc_curve() that draws ROC curve of average, e.g.
alpha
. See https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_roc_curve.html
- Returns
display – Object that stores computed values.
- Return type
RocCurveDisplay
- classmethod roc_plot(clf, x: List[str], y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colnames: Optional[List[str]] = None, cv=None, cv_seed=42, cv_group=None, ax=None, sample_weight=None, drop_intermediate=True, response_method='predict_proba', pos_label=None, average='macro', clf_params=None, fit_params=None, eval_set_selection=None, draw_grid=True, grid_kws=None, subplot_kws=None, legend_kws=None, plot_roc_kws=None, class_average_kws=None, cv_mean_kws=None, chance_plot_kws=None)¶
Plot Receiver operating characteristic (ROC) curve with cross validation.
Available both binary and multiclass classifiction.
Extra keyword arguments will be passed to matplotlib’s
plot
.- Parameters
clf (classifier object implementing
fit
) – Fitted classifier or a fittedPipeline
in which the last estimator is a classifier.x (list[str], or np.ndarray) – Explanatory variables. Should be list[str] if
data
is pd.DataFrame. Should be np.ndarray ifdata
is Noney (str or np.ndarray) – Objective variable. Should be str if
data
is pd.DataFrame. Should be np.ndarray ifdata
is Nonedata (pd.DataFrame, default=None) – Input data structure.
x_colnames (list[str], default=None) – Names of explanatory variables. Available only if
data
is NOT pd.DataFramecv (int or sklearn.model_selection.*, default=5) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.
cv_seed (int, default=42) – Seed for random number generator of cross validation.
cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to
groups
argument of cv.split().ax ({matplotlib.axes.Axes, list[matplotlib.axes.Axes]}, default=None) – Pre-existing axes for the plot or list of it. Otherwise, call matplotlib.pyplot.subplot() internally.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
drop_intermediate (boolean, default=True) – Whether to drop some suboptimal thresholds which would not appear on a plotted ROC curve. This is useful in order to create lighter ROC curves.
response_method ({'predict_proba', 'decision_function'}, default='predict_proba') – Specifies whether to use for calcurating class probability.
pos_label (str or int, default=None) – The class considered as the positive class when computing the roc auc metrics. By default, estimators.classes_[1] is considered as the positive class.
average ({'macro', 'micro'}, default='micro') – Specifies whether to use for calcurating average of tpr and fpr.
clf_params (dict, default=None) – Parameters passed to the classifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.
fit_params (dict, default=None) – Parameters passed to the fit() method of the classifier, e.g.
early_stopping_round
andeval_set
of XGBClassifier. If the classifier is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.eval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –
Select data passed to eval_set in fit_params. Available only if “estimator” is LightGBM or XGBoost.
If “all”, use all data in X and y.
If “train”, select train data from X and y using cv.split().
If “test”, select test data from X and y using cv.split().
If “original”, use raw eval_set.
If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.
draw_grid (bool, default=True) – If True, grid lines are drawn.
grid_kws (dict, default=None) – Additional parameters passed to matplotlib.pyplot.grid() that draws grid lines, e.g.
color
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.grid.htmlsubplot_kws (dict, default=None) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g.
figsize
. Avealable only ifax
is None. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.htmllegend_kws (dict) – Additional parameters passed to ax.legend(), e.g.
loc
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.htmlplot_roc_kws (dict, default=None) – Additional parameters passed to matplotlib.pyplot.plot() that draws ROC curve of each classes, e.g.
lw
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.htmlclass_average_kws (dict, default=None) – Additional parameters passed to matplotlib.pyplot.plot() or sklearn.metrics.plot_roc_curve() that draws average ROC curve of all classes, e.g.
lw
. See https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_roc_curve.htmlcv_mean_kws (dict, default=None) – Additional parameters passed to matplotlib.pyplot.plot() that draws mean ROC curve of all folds of cross validation, e.g.
lw
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.htmlchance_plot_kws (dict, default=None) – Additional parameters passed to matplotlib.pyplot.plot() that draws chance line, e.g.
lw
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html
seaborn_analyzer.custom_reg_plot module¶
- class seaborn_analyzer.custom_reg_plot.regplot¶
Bases:
object
- classmethod average_plot(estimator, x: List[str], y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colnames: Optional[List[str]] = None, hue=None, aggregate='mean', cv=None, cv_seed=42, cv_group=None, display_cv_indices=0, estimator_params=None, fit_params=None, eval_set_selection=None, subplot_kws=None, plot_kws=None, scatter_kws=None, legend_kws=None)¶
Plot relationship between one explanatory variable and predicted value by line graph.
Other explanatory variables are fixed to aggregated values such as mean values or median values.
- Parameters
estimator (estimator object implementing
fit
) – Regression estimator. This is assumed to implement the scikit-learn estimator interface.x (list[str] or np.ndarray) – Explanatory variables. Should be list[str] if
data
is pd.DataFrame. Should be np.ndarray ifdata
is Noney (str or np.ndarray) – Objective variable. Should be str if
data
is pd.DataFrame. Should be np.ndarray ifdata
is Nonedata (pd.DataFrame) – Input data structure.
x_colnames (list[str], optional) – Names of explanatory variables. Available only if
data
is NOT pd.DataFramehue (str, optional) – Grouping variable that will produce points with different colors.
aggregate ({'mean', 'median'}, optional) – Statistic method of aggregating explanatory variables except x_axis variable.
cv (int, cross-validation generator, or an iterable, optional) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.
cv_seed (int, optional) – Seed for random number generator of cross validation.
cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to
groups
argument of cv.split().display_cv_indices (int or list, optional) – Cross validation index or indices to display.
estimator_params (dict, optional) – Parameters passed to the regression estimator. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.
fit_params (dict, optional) – Parameters passed to the fit() method of the regression estimator, e.g.
early_stopping_round
andeval_set
of XGBRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.eval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –
Select data passed to eval_set in fit_params. Available only if estimator is LightGBM or XGBoost and cv is not None.
If “all”, use all data in X and y.
If “train”, select train data from X and y using cv.split().
If “test”, select test data from X and y using cv.split().
If “original”, use raw eval_set.
If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.
subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g.
figsize
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.htmlplot_kws (dict, optional) – Additional parameters passed to matplotlib.axes.Axes.plot(), e.g.
alpha
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.plot.htmlscatter_kws (dict, optional) – Additional parameters passed to seaborn.scatterplot(), e.g.
alpha
. See https://seaborn.pydata.org/generated/seaborn.scatterplot.htmllegend_kws (dict) – Additional parameters passed to matplotlib.axes.Axes.legend(), e.g.
loc
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html
- classmethod linear_plot(x: str, y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colname: Optional[str] = None, ax=None, hue=None, linecolor='red', rounddigit=5, plot_scores=True, scatter_kws=None, legend_kws=None)¶
Plot linear regression line and calculate Pearson correlation coefficient.
- Parameters
x (str) – Variable that specify positions on the x.
y (str) – Variable that specify positions on the y.
data (pd.DataFrame) – Input data structure.
x_colname (str, optional) – Names of explanatory variable. Available only if
data
is NOT pd.DataFrameax (matplotlib.axes.Axes, optional) – Pre-existing axes for the plot. Otherwise, call matplotlib.pyplot.gca() internally.
hue (str, optional) – Grouping variable that will produce points with different colors.
linecolor (str, optional) – Color of regression line. See https://matplotlib.org/stable/gallery/color/named_colors.html
rounddigit (int, optional) – Round a number of score to a given precision in decimal digits.
plot_scores (bool, optional) – If True, display Pearson correlation coefficient and the p-value.
scatter_kws (dict, optional) – Additional parameters passed to sns.scatterplot(), e.g.
alpha
. See https://seaborn.pydata.org/generated/seaborn.scatterplot.htmllegend_kws (dict) – Additional parameters passed to ax.legend(), e.g.
loc
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html
- Returns
ax – Returns the Axes object with the plot drawn onto it.
- Return type
matplotlib.axes.Axes
- classmethod regression_heat_plot(estimator, x: List[str], y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colnames: Optional[List[str]] = None, x_heat: Optional[List[str]] = None, scatter_hue=None, pair_sigmarange=1.5, pair_sigmainterval=0.5, heat_extendsigma=0.5, heat_division=30, color_extendsigma=0.5, plot_scatter='true', rounddigit_rank=3, rounddigit_x1=2, rounddigit_x2=2, rounddigit_x3=2, rank_number=None, rank_col=None, cv=None, cv_seed=42, cv_group=None, display_cv_indices=0, estimator_params=None, fit_params=None, eval_set_selection=None, subplot_kws=None, heat_kws=None, scatter_kws=None, legend_kws=None)¶
Plot regression heatmaps of any scikit-learn regressor with 2 to 4D explanatory variables.
- Parameters
estimator (estimator object implementing
fit
) – Regression estimator. This is assumed to implement the scikit-learn estimator interface.x (list[str] or np.ndarray) – Explanatory variables. Should be list[str] if
data
is pd.DataFrame. Should be np.ndarray ifdata
is Noney (str or np.ndarray) – Objective variable. Should be str if
data
is pd.DataFrame. Should be np.ndarray ifdata
is Nonedata (pd.DataFrame) – Input data structure.
x_colnames (list[str], optional) – Names of explanatory variables. Available only if
data
is NOT pd.DataFramex_heat (list[str], optional) – X-axis and y-axis variables of heatmap. If None, use two variables in
x
from the front.scatter_hue (str, optional) – Grouping variable that will produce points with different colors. Available only if plot_scatter is set to
hue
.pair_sigmarange (float, optional) – Set the range of subplots. The lower limit is mean({x3, x4}) -
pair_sigmarange
* std({x3, x4}). The higher limit is mean({x3, x4}) +pair_sigmarange
* std({x3, x4}). Available only if len(x) is bigger than 2.pair_sigmainterval (float, optional) – Set the interval of subplots. For example, if
pair_sigmainterval
is set to 0.5 andpair_sigmarange
is set to 1.0, The ranges of subplots are lower than μ-1σ, μ-1σ to μ-0.5σ, μ-0.5σ to μ, μ to μ+0.5σ, μ+0.5σ to μ+1σ, and higher than μ+1σ. Available only if len(x) is bigger than 2.heat_extendsigma (float, optional) – Set the axis view limits of the heatmap. The lower limit is min({x1, x2}) - std({x1, x2}) *
heat_extendsigma
. The higher limit is max({x1, x2}) + std({x1, x2}) *heat_extendsigma
heat_division (int, optional) – Resolution of the heatmap.
color_extendsigma (float, optional) – Set the colormap limits of the heatmap. The lower limit is min(y_ture) - std(y_ture) *
color_extendsigma
. The higher limit is max(y_ture) - std(y_ture) *color_extendsigma
.plot_scatter ({'error', 'true', 'hue'}, optional) – Color decision of scatter plot. If ‘error’, to be mapped to colors using error value. If ‘true’, to be mapped to colors using y_ture value. If ‘hue’, to be mapped to colors using scatter_hue variable. If None, no scatter.
rounddigit_rank (int, optional) – Round a number of error that are in the top posiotions for regression error to a given precision in decimal digits.
rounddigit_x1 (int, optional) – Round a number of x-axis valiable of the heatmap to a given precision in decimal digits.
rounddigit_x2 (int, optional) – Round a number of y-axis valiable of the heatmap to a given precision in decimal digits.
rounddigit_x3 (int, optional) – Round a number of y-axis valiable of subplots to a given precision in decimal digits.
rank_number (int, optional) – Number of emphasized data that are in the top posiotions for regression error.
rank_col (str, optional) – Variables that are displayed with emphasized data that are in the top posiotions for regression error.
cv (int, cross-validation generator, or an iterable, optional) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.
cv_seed (int, optional) – Seed for random number generator of cross validation.
cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to
groups
argument of cv.split().display_cv_indices (int or list, optional) – Cross validation index or indices to display.
estimator_params (dict, optional) – Parameters passed to the regression estimator. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.
fit_params (dict, optional) – Parameters passed to the fit() method of the regression estimator, e.g.
early_stopping_round
andeval_set
of XGBRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.eval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –
Select data passed to eval_set in fit_params. Available only if “estimator” is LightGBM or XGBoost.
If “all”, use all data in X and y.
If “train”, select train data from X and y using cv.split().
If “test”, select test data from X and y using cv.split().
If “original”, use raw eval_set.
If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.
subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g.
figsize
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.htmlheat_kws (dict, optional) – Additional parameters passed to sns.heatmap(), e.g.
cmap
. See https://seaborn.pydata.org/generated/seaborn.heatmap.htmlscatter_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.scatter(), e.g.
alpha
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.htmllegend_kws (dict) – Additional parameters passed to ax.legend(), e.g.
loc
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html
- classmethod regression_plot_1d(estimator, x: str, y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colname: Optional[str] = None, hue=None, linecolor='red', rounddigit=3, rank_number=None, rank_col=None, scores='mae', cv_stats='mean', cv=None, cv_seed=42, cv_group=None, estimator_params=None, fit_params=None, eval_set_selection=None, subplot_kws=None, scatter_kws=None, legend_kws=None)¶
Plot regression lines of any scikit-learn regressor with 1D explanatory variable.
- Parameters
estimator (estimator object implementing
fit
) – Regression estimator. This is assumed to implement the scikit-learn estimator interface.x (str, or np.ndarray) – Explanatory variables. Should be str if
data
is pd.DataFrame. Should be np.ndarray ifdata
is Noney (str or np.ndarray) – Objective variable. Should be str if
data
is pd.DataFrame. Should be np.ndarray ifdata
is Nonedata (pd.DataFrame) – Input data structure.
x_colname (str, optional) – Names of explanatory variable. Available only if
data
is NOT pd.DataFramehue (str, optional) – Grouping variable that will produce points with different colors.
linecolor (str, optional) – Color of prediction = true line. See https://matplotlib.org/stable/gallery/color/named_colors.html
rounddigit (int, optional) – Round a number of score to a given precision in decimal digits.
rank_number (int, optional) – Number of emphasized data that are in the top positions for regression error.
rank_col (list[str], optional) – Variables that are displayed with emphasized data that are in the top posiotions for regression error.
scores ({'r2', 'mae', 'mse', 'rmse', 'rmsle', 'mape', 'max_error'} or list,, optional) – Regression score that are displayed at the lower right of the graph.
cv_stats ({'mean', 'median', 'max', 'min'}, optional) – Statistical method of cross validation score that are displayed at the lower right of the graph.
cv (int, cross-validation generator, or an iterable, optional) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.
cv_seed (int, optional) – Seed for random number generator of cross validation.
cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to
groups
argument of cv.split().estimator_params (dict, optional) – Parameters passed to the regression estimator. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.
fit_params (dict, optional) – Parameters passed to the fit() method of the regression estimator, e.g.
early_stopping_round
andeval_set
of XGBRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g.
figsize
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.htmleval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –
Select data passed to eval_set in fit_params. Available only if “estimator” is LightGBM or XGBoost.
If “all”, use all data in X and y.
If “train”, select train data from X and y using cv.split().
If “test”, select test data from X and y using cv.split().
If “original”, use raw eval_set.
If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.
scatter_kws (dict, optional) – Additional parameters passed to sns.scatterplot(), e.g.
alpha
. See https://seaborn.pydata.org/generated/seaborn.scatterplot.htmllegend_kws (dict) – Additional parameters passed to ax.legend(), e.g.
loc
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html
- Returns
score_dict – Validation scores, e.g. r2, mae and rmse
- Return type
dict
- classmethod regression_pred_true(estimator, x: List[str], y: str, data: Optional[pandas.core.frame.DataFrame] = None, x_colnames: Optional[List[str]] = None, hue=None, linecolor='red', rounddigit=3, rank_number=None, rank_col=None, scores='mae', cv_stats='mean', cv=None, cv_seed=42, cv_group=None, ax=None, estimator_params=None, fit_params=None, eval_set_selection=None, subplot_kws=None, scatter_kws=None, legend_kws=None)¶
Plot prediction vs. true scatter plots of any scikit-learn regression estimator
- Parameters
estimator (estimator object implementing
fit
) – Regression estimator. This is assumed to implement the scikit-learn estimator interface.x (str or list[str]) – Explanatory variables.
y (str) – Objective variable.
data (pd.DataFrame) – Input data structure.
x_colnames (list[str], optional) – Names of explanatory variables. Available only if
data
is NOT pd.DataFramehue (str, optional) – Grouping variable that will produce points with different colors.
linecolor (str, optional) – Color of prediction = true line. See https://matplotlib.org/stable/gallery/color/named_colors.html
rounddigit (int, optional) – Round a number of score to a given precision in decimal digits.
rank_number (int, optional) – Number of emphasized data that are in the top posiotions for regression error.
rank_col (list[str], optional) – Variables that are displayed with emphasized data that are in the top posiotions for regression error.
scores ({'r2', 'mae', 'mse', 'rmse', 'rmsle', 'mape', 'max_error'} or list, optional) – Regression score that are displayed at the lower right of the graph.
cv_stats ({'mean', 'median', 'max', 'min'}, optional) – Statistical method of cross validation score that are displayed at the lower right of the graph.
cv (int, cross-validation generator, or an iterable, optional) – Determines the cross-validation splitting strategy. If None, to use the default 5-fold cross validation. If int, to specify the number of folds in a KFold.
cv_seed (int, optional) – Seed for random number generator of cross validation.
cv_group (str, optional) – Group variable for the samples used while splitting the dataset into train/test set. This argument is passed to
groups
argument of cv.split().ax ({matplotlib.axes.Axes, list[matplotlib.axes.Axes]}, optional) – Pre-existing axes for the plot or list of it. Otherwise, call matplotlib.pyplot.subplot() internally.
estimator_params (dict, optional) – Parameters passed to the regression estimator. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.
fit_params (dict, optional) – Parameters passed to the fit() method of the regression estimator, e.g.
early_stopping_round
andeval_set
of XGBRegressor. If the estimator is pipeline, each parameter name must be prefixed such that parameter p for step s has key s__p.eval_set_selection ({'all', 'test', 'train', 'original', 'original_transformed'}, optional) –
Select data passed to eval_set in fit_params. Available only if estimator is LightGBM or XGBoost and cv is not None.
If “all”, use all data in X and y.
If “train”, select train data from X and y using cv.split().
If “test”, select test data from X and y using cv.split().
If “original”, use raw eval_set.
If “original_transformed”, use eval_set transformed by fit_transform() of pipeline if estimater is pipeline.
subplot_kws (dict, optional) – Additional parameters passed to matplotlib.pyplot.subplots(), e.g. figsize. Available only if
axes
is None. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.htmlscatter_kws (dict, optional) – Additional parameters passed to sns.scatterplot(), e.g.
alpha
. See https://seaborn.pydata.org/generated/seaborn.scatterplot.htmllegend_kws (dict) – Additional parameters passed to ax.legend(), e.g.
loc
. See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html
- Returns
score_dict – Validation scores, e.g. r2, mae and rmse
- Return type
dict