Title: | Interactively Explore Local Explanations with the Radial Tour |
---|---|
Description: | Given a non-linear model, calculate the local explanation. We purpose view the data space, explanation space, and model residuals as ensemble graphic interactive on a shiny application. After an observation of interest is identified, the normalized variable importance of the local explanation is used as a 1D projection basis. The support of the local explanation is then explored by changing the basis with the use of the radial tour <doi:10.32614/RJ-2020-027>; <doi:10.1080/10618600.1997.10474754>. |
Authors: | Nicholas Spyrison [aut, cre] |
Maintainer: | Nicholas Spyrison <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4.0.0 |
Built: | 2024-11-04 03:37:25 UTC |
Source: | https://github.com/nspyrison/cheem |
Predictions and treeshap attribution of a random forest model of North Ames house sales data regressing Sales Price from house and lot variables.
ames_rf_pred ames_rf_shap
ames_rf_pred ames_rf_shap
ames_rf_pred
is a n=338 length vector of the prediction of an
random forest model predicting the numeric House Sales in North Ames.
ames_rf_shap
is a (338 x 9) data frame of the treeshap SHAP attribution of
the random forest model for each observation.
Replicating
library(cheem) library(randomForest) library(treeshap) set.seed(135) ## Regression setup dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## Model and treeSHAP ames_rf_fit <- randomForest::randomForest( X, Y, ntree = 125, mtry = ifelse(is_discrete(Y), sqrt(ncol(X)), ncol(X) / 3), nodesize = max(ifelse(is_discrete(Y), 1, 5), nrow(X) / 500)) ames_rf_pred <- predict(ames_rf_fit, X) ames_rf_shap <- treeshap::treeshap( treeshap::randomForest.unify(ames_rf_fit, X), X, FALSE, FALSE) ames_rf_shap <- ames_rf_shap$shaps if(F){ ## Don't accidentally save save(ames_rf_pred, file = "./data/ames_rf_pred.rda") save(ames_rf_shap, file = "./data/ames_rf_shap.rda") #usethis::use_data(ames_rf_pred) #usethis::use_data(ames_rf_shap) }
An object of class data.frame
with 338 rows and 9 columns.
library(cheem) ## Regression setup dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## Precomputed predictions and shap attribution str(ames_rf_pred) str(ames_rf_shap) ## Cheem ames_chm <- cheem_ls(X, Y, ames_rf_shap, ames_rf_pred, clas, label = "Ames, random forest, treeshap") ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(ames_chm, "./chm_ames_rf_tshap.rds") run_app() ## Select the saved rds file from the data dropdown. } ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(ames_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(ames_rf_shap, prim, comp) mv <- sug_manip_var(ames_rf_shap, primary_obs = prim, comp) ggt <- radial_cheem_tour(ames_chm, basis = bas, manip_var = mv) animate_plotly(ggt) }
library(cheem) ## Regression setup dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## Precomputed predictions and shap attribution str(ames_rf_pred) str(ames_rf_shap) ## Cheem ames_chm <- cheem_ls(X, Y, ames_rf_shap, ames_rf_pred, clas, label = "Ames, random forest, treeshap") ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(ames_chm, "./chm_ames_rf_tshap.rds") run_app() ## Select the saved rds file from the data dropdown. } ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(ames_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(ames_rf_shap, prim, comp) mv <- sug_manip_var(ames_rf_shap, primary_obs = prim, comp) ggt <- radial_cheem_tour(ames_chm, basis = bas, manip_var = mv) animate_plotly(ggt) }
House sales prices from Ames, Iowa, USA between 2006 and 2010. Only complete numeric observations remain.
amesHousing2018 amesHousing2018_raw amesHousing2018_NorthAmes
amesHousing2018 amesHousing2018_raw amesHousing2018_NorthAmes
complete data.frame with 2291 rows and 18 numeric variables, SalesPrice, the response variable, and 3 class variables
An object of class data.frame
with 2930 rows and 82 columns.
An object of class data.frame
with 338 rows and 11 columns.
Complete data.frame, n = 2291, 18 numeric variable (including 2 temporal: MoSold, YrSold ), response variable SalePrice, 3 class factors.
A simplified subsample, just North Ames (largest neighborhood). Complete data.frame, n = 338, 9 numeric variables, response variable SalePrice, 1 class factor SubclassMS, a zoning subclass.
Original data from Kaggle, 2930 rows of 82 variables. Sparse rows (639) and sparse/defaulted columns (64) are removed.
No data dictionary is provided on Kaggle, but amesHousing2018 variables are inferred to be:
LotFrontage, Length of the front (street facing) side of the lot in yards (0.914m)
LotArea, Area of the lot in square yards (0.836m^2)
OverallQual, Overall quality (of the house?), integer in (1, 10)
OverallCond, Overall condition (of the lot?), integer in (1, 10)
YearBuild, The year the house was originally built
BsmtUnfArea, Unfinished basement area, in square yards (0.836m^2)
TotBsmtArea, Total basement area, in square yards (0.836m^2)
1stFlrArea, First (ground) floor living area in square yards (0.836m^2)
LivingArea, Total living area in square yards (0.836m^2)
Bathrms, The number of bathrooms
Bedrms, The number of bedrooms
TotRms, The total number of rooms
GarageYrBlt, The year the garage was build
GarageCars, The number of car spaces in the garage
GarageArea, The area of the garage in square yards (0.836m^2)
MoSold, The number of the month of the house sale
YrSold, The number of the year of the house sale
SalePrice, The sale of the house in USD (as of the year of sale?)
SubclassMS, Factor subclass of construction zone, 16 levels
SubclassMS, Factor major class of construction zone, 7 levels
Neighborhd, Factor neighborhood of Ames, IA, 28 levels
De Cock, D. (2011). "Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester Regression Project," Journal of Statistics Education, Volume 19, Number 3. http://jse.amstat.org/v19n3/decock/DataDocumentation.txt http://jse.amstat.org/v19n3/decock.pdf
Kaggle, Ames Housing Dataset https://www.kaggle.com/prevek18/ames-housing-dataset
Replicating this dataset:
if(FALSE) ## Don't accidentally open the URL. browseURL("https://www.kaggle.com/prevek18/ames-housing-dataset") ames <- readr::read_csv("./buildignore/AmesHousing.csv") amesHousing2018_raw <- data.frame(ames) ## save(amesHousing2018_raw, file = "./data/amesHousing2018_raw.rda") ## Complete rows and numeric variables ames1 <- ames[, unlist(lapply(ames, is.numeric))] ames1$Bathrooms <- ames1$`Full Bath` + ames1$`Half Bath` ames1 <- ames1[, c(1:18, 38, 19:37)] col_idx <- !(colnames(ames1) %in% c( "Order", "Mas Vnr Area", "BsmtFin SF 1", "BsmtFin SF 2", "Bsmt Full Bath", "Bsmt Half Bath", "Fireplaces", "Wood Deck SF", "Open Porch SF", "Enclosed Porch", "3Ssn Porch", "Screen Porch", "Pool Area", "Misc Val", "2nd Flr SF", "Low Qual Fin SF", "Full Bath", "Half Bath", "Kitchen AbvGr")) row_idx <- !is.na(ames1$"Garage Yr Blt") & !is.na(ames1$"Lot Frontage") & !is.na(ames1$"Bsmt Unf SF") & !is.na(ames1$"Total Bsmt SF") ames2 <- as.data.frame(ames1[row_idx, col_idx]) ## Looking for character classes to keep: ames_char <- ames[, unlist(lapply(ames, is.character))] ames_clas <- as.data.frame(lapply(ames_char, factor))[, -1] ames_clasint <- data.frame(lapply(ames_clas, as.integer)) col_idx_char <- which(names(ames_clas) %in% c("MS.SubClass", "MS.Zoning", "Neighborhood")) classes <- ames_clas[row_idx, col_idx_char] amesHousing2018 <- cbind(ames2, classes) names(amesHousing2018) <- c( "LotFrontage", "LotArea","OverallQual", "OverallCond", "YearBuild", "YearRemod", "BsmtUnfArea", "TotBsmtArea", "1stFlrArea", "LivingArea", "Bathrms", "Bedrms", "TotRms", "GarageYrBlt", "GarageCars", "GarageArea", "MoSold", "YrSold", "SalePrice", "SubclassMS", "ZoneMS", "Neighborhd") ## save(amesHousing2018, file = "./data/amesHousing2018.rda") .thin_col_idx <- names(amesHousing2018) %in% c( "LotArea", "OverallQual", "YearBuild", "LivingArea", "Bathrms", "Bedrms", "TotRms", "GarageYrBlt", "GarageArea", "SalePrice", "SubclassMS") amesHousing2018_thin <- amesHousing2018[, .thin_col_idx] ## subset to north ames, and only 5 largest subclasses r_idx <- amesHousing2018$Neighborhd == "NAmes" & amesHousing2018$SubclassMS %in% c("020", "050", "080", "090", "060") amesHousing2018_NorthAmes <- amesHousing2018_thin[r_idx, ] amesHousing2018_NorthAmes$SubclassMS <- factor( amesHousing2018_NorthAmes$SubclassMS, unique(amesHousing2018_NorthAmes$SubclassMS)) if(F){ ## Don't accidentally save save(amesHousing2018_NorthAmes, file = "./data/amesHousing2018_NorthAmes.rda")
library(cheem) ## Regression setup: dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## Cheem list ames_rf_chm <- cheem_ls(X, Y, ames_rf_shap, ames_rf_pred, clas, label = "North Ames, RF, SHAP") ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(ames_rf_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(ames_rf_chm, prim, comp) mv <- sug_manip_var(ames_rf_chm, primary_obs = prim, comp) ggt <- radial_cheem_tour(ames_rf_chm, basis = bas, manip_var = mv) animate_plotly(ggt) } ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(ames_rf_chm, "./chm_NAmes_rf_tshap.rds") run_app() ## Select the saved rds file from the data drop down. }
library(cheem) ## Regression setup: dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## Cheem list ames_rf_chm <- cheem_ls(X, Y, ames_rf_shap, ames_rf_pred, clas, label = "North Ames, RF, SHAP") ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(ames_rf_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(ames_rf_chm, prim, comp) mv <- sug_manip_var(ames_rf_chm, primary_obs = prim, comp) ggt <- radial_cheem_tour(ames_rf_chm, basis = bas, manip_var = mv) animate_plotly(ggt) } ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(ames_rf_chm, "./chm_NAmes_rf_tshap.rds") run_app() ## Select the saved rds file from the data drop down. }
Suggests a alpha opacity to plot with as a function of the number of observation.
as_logical_index(index, n)
as_logical_index(index, n)
index |
A vector, typically a numeric row index of the data to coerce to a logical index. |
n |
Single numeric, the number of rows of the data use as a replicate return length. |
A logical index of length n.
Other cheem utility:
color_scale_of()
,
contains_nonnumeric()
,
is_discrete()
,
is_diverging()
,
linear_tform()
,
logistic_tform()
,
problem_type()
,
rnorm_from()
,
sug_basis()
,
sug_manip_var()
library(cheem) ## Coerce a numeric index to logical as_logical_index(c(1, 4:10, 15), nrow(mtcars)) ## Logical indexs are unchanged as_logical_index(mtcars$mpg > 30, nrow(mtcars))
library(cheem) ## Coerce a numeric index to logical as_logical_index(c(1, 4:10, 15), nrow(mtcars)) ## Logical indexs are unchanged as_logical_index(mtcars$mpg > 30, nrow(mtcars))
Performs the preprocessing steps needs to supply the plot functions
global_view()
and radial_cheem_tour()
used in the shiny app.
The user need to supply the attributions and predictions. For help getting
started with this see vignette("getting-started-with-cheem")
.
cheem_ls( x, y = NULL, attr_df, pred = NULL, class = NULL, label = "label", verbose = getOption("verbose") )
cheem_ls( x, y = NULL, attr_df, pred = NULL, class = NULL, label = "label", verbose = getOption("verbose") )
x |
The explanatory variables of the model. |
y |
The target variable of the model. |
attr_df |
A data frame or matrix of (n, p) local explanation attributions. |
pred |
A (n, 1) vector, the predictions from the associated model. |
class |
Optional, (n, 1) vector, a variable to group points by.
This can be the same as or different from |
label |
Optionally provide a character label to store reminder text for the type of model and local explanation used. Defaults to "label". |
verbose |
Logical, if start time and run duration should be printed. Defaults to getOption("verbose"). |
A list of data.frames needed for the shiny
application.
global_view()
radial_cheem_tour()
radial_cheem_tour()
Other cheem preprocessing:
subset_cheem()
library(cheem) ## Classification setup X <- spinifex::penguins_na.rm[, 1:4] Y <- spinifex::penguins_na.rm$species clas <- spinifex::penguins_na.rm$species ## Bring your own attributions and predictions, ## for help review the vignette or package down site; if(FALSE){ vignette("getting-started-with-cheem") browseURL("https://nspyrison.github.io/cheem/articles/getting-started-with-cheem.html") } ## Cheem peng_chm <- cheem_ls(X, Y, penguin_xgb_shap, penguin_xgb_pred, clas, label = "Penguins, xgb, shapviz") ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(peng_chm, "./chm_peng_xgb_shapviz.rds") run_app() ## Select the saved rds file from the data dropdown. } ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(peng_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(penguin_xgb_shap, prim, comp) mv <- sug_manip_var(penguin_xgb_shap, primary_obs = prim, comp) ggt <- radial_cheem_tour(peng_chm, basis = bas, manip_var = mv) animate_plotly(ggt) } ## Regression example library(cheem) ## Regression setup dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS #' ## Bring your own attributions and predictions, ## for help review the vignette or package down site; if(FALSE){ vignette("getting-started-with-cheem") browseURL("https://nspyrison.github.io/cheem/articles/getting-started-with-cheem.html") } ## Cheem list ames_rf_chm <- cheem_ls(X, Y, ames_rf_shap, ames_rf_pred, clas, label = "North Ames, RF, treeshap") ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(ames_rf_chm, "./chm_NAmes_rf_tshap.rds") run_app() ## Select the saved rds file from the data drop down. } ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(ames_rf_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(ames_rf_shap, prim, comp) mv <- sug_manip_var(ames_rf_shap, primary_obs = prim, comp) ggt <- radial_cheem_tour(ames_rf_chm, basis = bas, manip_var = mv) animate_plotly(ggt) }
library(cheem) ## Classification setup X <- spinifex::penguins_na.rm[, 1:4] Y <- spinifex::penguins_na.rm$species clas <- spinifex::penguins_na.rm$species ## Bring your own attributions and predictions, ## for help review the vignette or package down site; if(FALSE){ vignette("getting-started-with-cheem") browseURL("https://nspyrison.github.io/cheem/articles/getting-started-with-cheem.html") } ## Cheem peng_chm <- cheem_ls(X, Y, penguin_xgb_shap, penguin_xgb_pred, clas, label = "Penguins, xgb, shapviz") ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(peng_chm, "./chm_peng_xgb_shapviz.rds") run_app() ## Select the saved rds file from the data dropdown. } ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(peng_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(penguin_xgb_shap, prim, comp) mv <- sug_manip_var(penguin_xgb_shap, primary_obs = prim, comp) ggt <- radial_cheem_tour(peng_chm, basis = bas, manip_var = mv) animate_plotly(ggt) } ## Regression example library(cheem) ## Regression setup dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS #' ## Bring your own attributions and predictions, ## for help review the vignette or package down site; if(FALSE){ vignette("getting-started-with-cheem") browseURL("https://nspyrison.github.io/cheem/articles/getting-started-with-cheem.html") } ## Cheem list ames_rf_chm <- cheem_ls(X, Y, ames_rf_shap, ames_rf_pred, clas, label = "North Ames, RF, treeshap") ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(ames_rf_chm, "./chm_NAmes_rf_tshap.rds") run_app() ## Select the saved rds file from the data drop down. } ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(ames_rf_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(ames_rf_shap, prim, comp) mv <- sug_manip_var(ames_rf_shap, primary_obs = prim, comp) ggt <- radial_cheem_tour(ames_rf_chm, basis = bas, manip_var = mv) animate_plotly(ggt) }
The chocolates data was compiled by students at Iowa State University of STAT503 (circa 2015) taught by Dianne Cook. Nutrition label information on the chocolates as listed on manufacturer websites. All numbers were normalized to be equivalent to a 100g serving. Units of measurement are listed in the variable name.
chocolates
chocolates
A complete data.frame with 88 observations and 10 numeric variables, name of the chocolate, manufacturer, country, and type of the chocolate.
Name, the name of the chocolate
MFR, chocolate manufacturer
Country, the country the manufacturer is incorporated.
Type, the type of chocolate according to the website, either 'Dark' or 'Milk"
Calories, the number of calories per 100 grams
CalFat, calories from fat per 100 grams
TotFat_g, grams of total fat per 100 grams
SatFat_g, grams of saturated fat per 100 grams
Chol_mg, milligrams of cholesterol per 100 grams
Na_mg, milligrams of sodium (salt) per 100 grams
Carbs_g, grams of carbohydrates per 100 grams
Fiber_g, grams of fiber per 100 grams
Sugars_g, grams of sugar per 100 grams
Protein_g, grams of sugar per 100 grams
Monash University, Introduction to Machine Learning course https://iml.numbat.space/
Replicating this dataset:
if(FALSE) ## Don't accidentally open the URL. browseURL("https://iml.numbat.space/") ## Accessed Jan 2022 chocolates <- readr::read_csv("https://iml.numbat.space/data/chocolates.csv") chocolates <- data.frame(chocolates) chocolates[, 2] <- factor(chocolates[, 2]) chocolates[, 3] <- factor(chocolates[, 3]) chocolates[, 4] <- factor(chocolates[, 4]) if(F){ ## Don't accidentally save save(chocolates, file = "./data/chocolates.rda")
library(cheem) ## Classification setup X <- chocolates[, 5:14] Y <- chocolates$Type clas <- chocolates$Type ## Cheem choc_chm <- cheem_ls(X, Y, chocolates_svm_shap, chocolates_svm_pred, clas, label = "Chocolates, LM, shap") ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(choc_chm, "./chm_chocolates_svm_shap.rds") run_app() ## Select the saved rds file from the data dropdown. } ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(peng_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(penguin_xgb_shap, prim, comp) mv <- sug_manip_var(penguin_xgb_shap, primary_obs = prim, comp) ggt <- radial_cheem_tour(peng_chm, basis = bas, manip_var = mv) animate_plotly(ggt) }
library(cheem) ## Classification setup X <- chocolates[, 5:14] Y <- chocolates$Type clas <- chocolates$Type ## Cheem choc_chm <- cheem_ls(X, Y, chocolates_svm_shap, chocolates_svm_pred, clas, label = "Chocolates, LM, shap") ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(choc_chm, "./chm_chocolates_svm_shap.rds") run_app() ## Select the saved rds file from the data dropdown. } ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(peng_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(penguin_xgb_shap, prim, comp) mv <- sug_manip_var(penguin_xgb_shap, primary_obs = prim, comp) ggt <- radial_cheem_tour(peng_chm, basis = bas, manip_var = mv) animate_plotly(ggt) }
Predictions and DALEX shap attribution of an svm model of Chocolate data classifying type of chocolate (light/dark).
chocolates_svm_pred chocolates_svm_shap
chocolates_svm_pred chocolates_svm_shap
chocolate_svm_pred
is a n=88 length vector of the prediction of an
svm model predicting the number of the factor level of the species of penguin.
chocolate_svm_shap
is a (88 x 10) data frame of the DALEX SHAP attribution
of the svm model for each observation.
Replicating
library(cheem) library(e1071) library(DALEX) set.seed(135) ## Classification setup X <- chocolates[, 5:14] Y <- chocolates$Type clas <- chocolates$Type ## Model and predict choc_svm_fit <- svm( formula = Y ~ ., data = data.frame(Y, X), type = 'C-classification', kernel = 'linear', probability = TRUE) chocolates_svm_pred <- predict(choc_svm_fit, data.frame(Y, X)) ## SHAP via DALEX, versatile but slow choc_svm_exp <- explain(choc_svm_fit, data = X, y = Y, label = "Chocolates, svm") ## Note that cheem expects a full [n, p] attribution space ## Shap takes about ~30-40 sec for me chocolates_svm_shap <- matrix(NA, nrow(X), ncol(X)) ## init a df of the same structure sapply(1:nrow(X), function(i){ pps <- predict_parts_shap(choc_svm_exp, new_observation = X[i, ]) ## Keep just the [n, p] local explanations chocolates_svm_shap[i, ] <<- tapply( pps$contribution, pps$variable, mean, na.rm = TRUE) %>% as.vector() }) chocolates_svm_shap <- as.data.frame(chocolates_svm_shap) if(F){ ## Don't accidentally save save(chocolates_svm_pred, file = "./data/chocolates_svm_pred.rda") save(chocolates_svm_shap, file = "./data/chocolates_svm_shap.rda") #usethis::use_data(chocolates_svm_pred) #usethis::use_data(chocolates_svm_shap) }
An object of class data.frame
with 88 rows and 10 columns.
library(cheem) ## Classification setup X <- chocolates[, 5:14] Y <- chocolates$Type clas <- chocolates$Type ## Precomputed predictions and shap attribution str(chocolates_svm_pred) str(chocolates_svm_shap) ## Cheem choc_chm <- cheem_ls(X, Y, chocolates_svm_shap, chocolates_svm_pred, clas, label = "Chocolates, SVM, shap") ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(choc_chm, "./cmh_chocolates_svm_shap.rds") run_app() ## Select the saved rds file from the data dropdown. } ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(choc_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(chocolates_svm_shap, prim, comp) mv <- sug_manip_var(chocolates_svm_shap, primary_obs = prim, comp) ggt <- radial_cheem_tour(choc_chm, basis = bas, manip_var = mv) animate_plotly(ggt) }
library(cheem) ## Classification setup X <- chocolates[, 5:14] Y <- chocolates$Type clas <- chocolates$Type ## Precomputed predictions and shap attribution str(chocolates_svm_pred) str(chocolates_svm_shap) ## Cheem choc_chm <- cheem_ls(X, Y, chocolates_svm_shap, chocolates_svm_pred, clas, label = "Chocolates, SVM, shap") ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(choc_chm, "./cmh_chocolates_svm_shap.rds") run_app() ## Select the saved rds file from the data dropdown. } ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(choc_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(chocolates_svm_shap, prim, comp) mv <- sug_manip_var(chocolates_svm_shap, primary_obs = prim, comp) ggt <- radial_cheem_tour(choc_chm, basis = bas, manip_var = mv) animate_plotly(ggt) }
Whether or not a vector is a diverges a value, returns a logical. Used to help default a scale_color for ggplot2.
color_scale_of(x, mid_pt = 0, limits = NULL, ...)
color_scale_of(x, mid_pt = 0, limits = NULL, ...)
x |
A vector to to scale the color to. |
mid_pt |
A single number checking divergence from. Defaults to 0. |
limits |
A vector of the min and max values for the scale. Useful for setting an absolute range, such as (-1, 1) for attribution/y correlation of each point. Points outside of limits show as default grey. Defaults to NULL; the range of x. |
... |
Optional other arguments passed to ggplot2::continuous_scale or ggplot2::discrete_scale. |
A list containing a scale_color, and scale_fill; the suggested color/fill scale for a ggplot.
Other cheem utility:
as_logical_index()
,
contains_nonnumeric()
,
is_discrete()
,
is_diverging()
,
linear_tform()
,
logistic_tform()
,
problem_type()
,
rnorm_from()
,
sug_basis()
,
sug_manip_var()
library(cheem) library(ggplot2) g <- ggplot(mtcars, aes(disp, mpg)) ## Discrete g + geom_point(aes(color = factor(vs))) + color_scale_of(mtcars$vs) ## Sequential increasing g + geom_point(aes(color = mpg)) + color_scale_of(mtcars$mpg) ## Dummy sequential decr g + geom_point(aes(color = -1 *mpg)) + color_scale_of(-1 * mtcars$mpg) ## Dummy diverging g + geom_point(aes(color = mpg - median(mpg))) + color_scale_of(mtcars$mpg - median(mtcars$mpg)) ## Dummy limits g + geom_point(aes(color = mpg - median(mpg))) + color_scale_of(mtcars$mpg - median(mtcars$mpg), limits = c(-5, 5))
library(cheem) library(ggplot2) g <- ggplot(mtcars, aes(disp, mpg)) ## Discrete g + geom_point(aes(color = factor(vs))) + color_scale_of(mtcars$vs) ## Sequential increasing g + geom_point(aes(color = mpg)) + color_scale_of(mtcars$mpg) ## Dummy sequential decr g + geom_point(aes(color = -1 *mpg)) + color_scale_of(-1 * mtcars$mpg) ## Dummy diverging g + geom_point(aes(color = mpg - median(mpg))) + color_scale_of(mtcars$mpg - median(mtcars$mpg)) ## Dummy limits g + geom_point(aes(color = mpg - median(mpg))) + color_scale_of(mtcars$mpg - median(mtcars$mpg), limits = c(-5, 5))
Returns a logical, whether or not a vector contains any non-numeric characters. Typically used to test if row names hold non-index information.
contains_nonnumeric(x)
contains_nonnumeric(x)
x |
A vector to be tested for existence of non-numeric characters. |
Logical, whether or not x contains any non-numeric characters.
Other cheem utility:
as_logical_index()
,
color_scale_of()
,
is_discrete()
,
is_diverging()
,
linear_tform()
,
logistic_tform()
,
problem_type()
,
rnorm_from()
,
sug_basis()
,
sug_manip_var()
library(cheem) contains_nonnumeric(mtcars$mpg) contains_nonnumeric(rownames(mtcars)) ## Meaningful info to use in tooltip contains_nonnumeric(rownames(cars)) ## Assume no meaningful info to use in tooltip
library(cheem) contains_nonnumeric(mtcars$mpg) contains_nonnumeric(rownames(mtcars)) ## Meaningful info to use in tooltip contains_nonnumeric(rownames(cars)) ## Assume no meaningful info to use in tooltip
Send a message if the 4th chunk of the package version is 9000.
devMessage(text)
devMessage(text)
text |
A character string to message() if package version is 9000. |
plotly
display, global view of data and attribution space.from a cheem_ls() list, create a linked plotly
of the global data-
and attribution- spaces. Typically consumed directly by shiny app.
global_view( cheem_ls, primary_obs = NULL, comparison_obs = NULL, color = c("default", "residual", "log_maha.data", "cor_attr_proj.y"), height_px = 480, width_px = 1440, as_ggplot = FALSE )
global_view( cheem_ls, primary_obs = NULL, comparison_obs = NULL, color = c("default", "residual", "log_maha.data", "cor_attr_proj.y"), height_px = 480, width_px = 1440, as_ggplot = FALSE )
cheem_ls |
A return from |
primary_obs |
The rownumber of the primary observation. Its local attribution becomes the 1d projection basis, and the point it highlighted as a dashed line. Defaults to NULL, no highlighting applied. |
comparison_obs |
The rownumber of the comparison observation. Point is highlighted as a dotted line. Defaults to NULL, no highlighting applied. |
color |
The name of the column in cheem_ls$global_view_df to map to color. Expects c("default", "residual", "log_maha.data", "cor_attr_proj.y"). Defaults to "default"; predicted_class for classification, dummy class for regression. |
height_px |
The height in pixels of the returned |
width_px |
The width in pixels of the returned |
as_ggplot |
Logical, if TRUE returns the plots before being passed to
|
A plotly
plot, an interactive html widget of the global view,
first two components of the basis of the data- and attribution- spaces.
Other cheem consumers:
global_view_legwork()
,
radial_cheem_tour()
,
run_app()
library(cheem) ## Regression setup: dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## global_view() ames_rf_chm <- cheem_ls(X, Y, ames_rf_shap, ames_rf_pred, clas, label = "North Ames, RF, SHAP") if(interactive()){ global_view(ames_rf_chm, as_ggplot = TRUE) ## early return of ggplot global_view(ames_rf_chm) ## uses ggplot facets %>% plotly ## Different color mappings, especially for regression global_view(ames_rf_chm, color = "residual") global_view(ames_rf_chm, color = "log_maha.data") global_view(ames_rf_chm, color = "cor_attr_proj.y") } ## Also consumed by: ?run_app()
library(cheem) ## Regression setup: dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## global_view() ames_rf_chm <- cheem_ls(X, Y, ames_rf_shap, ames_rf_pred, clas, label = "North Ames, RF, SHAP") if(interactive()){ global_view(ames_rf_chm, as_ggplot = TRUE) ## early return of ggplot global_view(ames_rf_chm) ## uses ggplot facets %>% plotly ## Different color mappings, especially for regression global_view(ames_rf_chm, color = "residual") global_view(ames_rf_chm, color = "log_maha.data") global_view(ames_rf_chm, color = "cor_attr_proj.y") } ## Also consumed by: ?run_app()
Internal function consumed in the cheem workflow. Produces the plot data.frame of 1 layer. consumed downstream in cheem_ls.
global_view_df_1layer(x, class = NULL, label = "label")
global_view_df_1layer(x, class = NULL, label = "label")
x |
The explanatory variables of the model. |
class |
Optional, (n, 1) vector, a variable to group points by.
This can be the same as or different from |
label |
Optionally provide a character label to store reminder text for the type of model and local explanation used. Defaults to "label". |
A data.frame, for the global linked plotly display.
Not meant for end user. This is an internal legwork function to keep from repeating global_view code.
global_view_legwork( cheem_ls, primary_obs = NULL, comparison_obs = NULL, color = c("default", "residual", "log_maha.data", "cor_attr_proj.y") )
global_view_legwork( cheem_ls, primary_obs = NULL, comparison_obs = NULL, color = c("default", "residual", "log_maha.data", "cor_attr_proj.y") )
cheem_ls |
A return from |
primary_obs |
The rownumber of the primary observation. Its local attribution becomes the 1d projection basis, and the point it highlighted as a dashed line. Defaults to NULL, no highlighting applied. |
comparison_obs |
The rownumber of the comparison observation. Point is highlighted as a dotted line. Defaults to NULL, no highlighting applied. |
color |
The name of the column in cheem_ls$global_view_df to map to color. Expects c("default", "residual", "log_maha.data", "cor_attr_proj.y"). Defaults to "default"; predicted_class for classification, dummy class for regression. |
a ggplot2
object, waiting to be rendered with global_view
or
global_view_subplots
.
Other cheem consumers:
global_view()
,
radial_cheem_tour()
,
run_app()
Evaluate the expression if the 4th chunk of the package version is 9000.
ifDev(expr)
ifDev(expr)
expr |
A character string to message() if package version is 9000. |
Whether or not a vector is a discrete variable, returns a logical. Typically used on the Y variable of a model.
is_discrete(x, na.rm = TRUE)
is_discrete(x, na.rm = TRUE)
x |
A vector to check the discreteness of. |
na.rm |
Whether or not to remove NA values before testing discreteness. Defaults to TRUE. |
Logical, whether or not x
is a discrete variable.
Other cheem utility:
as_logical_index()
,
color_scale_of()
,
contains_nonnumeric()
,
is_diverging()
,
linear_tform()
,
logistic_tform()
,
problem_type()
,
rnorm_from()
,
sug_basis()
,
sug_manip_var()
library(cheem) is_discrete(mtcars$mpg) ## Numeric column, with more than 25 unique values. is_discrete(mtcars$cyl) ## Numeric column, labeled as discrete, because less than 25 unique values is_discrete(letters) ## Characters and factors labeled discrete.
library(cheem) is_discrete(mtcars$mpg) ## Numeric column, with more than 25 unique values. is_discrete(mtcars$cyl) ## Numeric column, labeled as discrete, because less than 25 unique values is_discrete(letters) ## Characters and factors labeled discrete.
Whether or not a vector is a diverges a value, returns a logical. Used to help default a scale_color for ggplot2.
is_diverging(x, mid_pt = 0)
is_diverging(x, mid_pt = 0)
x |
A vector to check the divergence of. |
mid_pt |
A single number checking divergence from. Defaults to 0. |
Logical, whether or not x
is a diverges mid_pt
.
Other cheem utility:
as_logical_index()
,
color_scale_of()
,
contains_nonnumeric()
,
is_discrete()
,
linear_tform()
,
logistic_tform()
,
problem_type()
,
rnorm_from()
,
sug_basis()
,
sug_manip_var()
library(cheem) is_diverging(-10:10) is_diverging(-10:-5) is_diverging(mtcars$mpg, 25) is_diverging(mtcars$mpg, 40)
library(cheem) is_diverging(-10:10) is_diverging(-10:-5) is_diverging(mtcars$mpg, 25) is_diverging(mtcars$mpg, 40)
Suggests a alpha opacity to plot with as a function of the number of observation.
linear_tform(n, appox_max_n = 5000, ceiling = 1, floor = 0.3)
linear_tform(n, appox_max_n = 5000, ceiling = 1, floor = 0.3)
n |
Number of observations to plot. |
appox_max_n |
The number of observation to reach floor opacity. |
ceiling |
The highest number returned. Defaults to 1. |
floor |
The lowest number returned. Defaults to 0.3. |
A scalar numeric, suggested value to set alpha opacity.
Other cheem utility:
as_logical_index()
,
color_scale_of()
,
contains_nonnumeric()
,
is_discrete()
,
is_diverging()
,
logistic_tform()
,
problem_type()
,
rnorm_from()
,
sug_basis()
,
sug_manip_var()
library(cheem) ## Suggest an opacity to use in plotting: (my_alpha <- linear_tform(nrow(spinifex::penguins_na.rm))) ## Visualize x <- 1:2000 plot(x, sapply(x, linear_tform), col = "blue")
library(cheem) ## Suggest an opacity to use in plotting: (my_alpha <- linear_tform(nrow(spinifex::penguins_na.rm))) ## Visualize x <- 1:2000 plot(x, sapply(x, linear_tform), col = "blue")
Suggests a alpha opacity to plot with as a function of the number of observation.
logistic_tform(n, mid_pt = 600, k_attenuation = 5, ceiling = 1, floor = 0.3)
logistic_tform(n, mid_pt = 600, k_attenuation = 5, ceiling = 1, floor = 0.3)
n |
Number of observations to plot. |
mid_pt |
Inflection point that the logistic curve. Defaults to 600. |
k_attenuation |
The steepness of the transition, larger is a sharper transition. Quite sensitive and defaults to 5. |
ceiling |
The highest number returned. Defaults to 1. |
floor |
The lowest number returned. Defaults to 0.3. |
A scalar numeric, suggested value to set alpha opacity.
Other cheem utility:
as_logical_index()
,
color_scale_of()
,
contains_nonnumeric()
,
is_discrete()
,
is_diverging()
,
linear_tform()
,
problem_type()
,
rnorm_from()
,
sug_basis()
,
sug_manip_var()
library(cheem) ## Suggest an opacity to use in plotting: (my_alpha <- logistic_tform(nrow(spinifex::penguins_na.rm))) ## Visualize x <- 1:2000 plot(x, logistic_tform(x), col = "blue")
library(cheem) ## Suggest an opacity to use in plotting: (my_alpha <- logistic_tform(nrow(spinifex::penguins_na.rm))) ## Visualize x <- 1:2000 plot(x, logistic_tform(x), col = "blue")
Internal function, used downstream in cheem_ls.
model_performance(y, pred, label = "label")
model_performance(y, pred, label = "label")
y |
Observed response, required by ranger models. |
pred |
A (n, 1) vector of the predicted values for each observation.
Typically obtainable with |
label |
Optionally provide a character label to store reminder text for the type of model and local explanation used. Defaults to "label". |
A data.frame of model performance statistics.
Predictions and shapviz attribution of an xgb model of Penguin data classifying penguin species.
penguin_xgb_pred penguin_xgb_shap
penguin_xgb_pred penguin_xgb_shap
penguin_xgb_pred
is a n=333 length vector of the prediction of an
xgb model predicting the number of the factor level of the species of penguin.
penguin_xgb_shap
is a (333 x 4) data frame of the shapviz SHAP attribution of
the xgb model for each observation.
Replicating
library(cheem) library(xgboost) library(shapviz) set.seed(135) ## Classification setup X <- spinifex::penguins_na.rm[, 1:4] Y <- spinifex::penguins_na.rm$species clas <- spinifex::penguins_na.rm$species ## Model and predict peng_train <- data.matrix(X) %>% xgb.DMatrix(label = Y) peng_xgb_fit <- xgboost(data = peng_train, max.depth = 3, nrounds = 5) penguin_xgb_pred <- predict(peng_xgb_fit, newdata = peng_train) ## shapviz penguin_xgb_shap <- shapviz(peng_xgb_fit, X_pred = peng_train, X = X) penguin_xgb_shap <- penguin_xgb_shap$S if(F){ ## Don't accidentally save save(penguin_xgb_pred, file = "./data/penguin_xgb_pred.rda") save(penguin_xgb_shap, file = "./data/penguin_xgb_shap.rda") #usethis::use_data(penguin_xgb_pred) #usethis::use_data(penguin_xgb_shap) }
An object of class matrix
(inherits from array
) with 333 rows and 4 columns.
library(cheem) ## Classification setup X <- spinifex::penguins_na.rm[, 1:4] Y <- spinifex::penguins_na.rm$species clas <- spinifex::penguins_na.rm$species ## Precomputed predictions and shap attribtion str(penguin_xgb_pred) str(penguin_xgb_shap) ## Cheem peng_chm <- cheem_ls(X, Y, penguin_xgb_shap, penguin_xgb_pred, clas, label = "Penguins, xgb, shapviz") ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(peng_chm, "./chm_peng_xgb_shapviz.rds") run_app() ## Select the saved rds file from the data dropdown. } ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(peng_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(penguin_xgb_shap, prim, comp) mv <- sug_manip_var(penguin_xgb_shap, primary_obs = prim, comp) ggt <- radial_cheem_tour(peng_chm, basis = bas, manip_var = mv) animate_plotly(ggt) }
library(cheem) ## Classification setup X <- spinifex::penguins_na.rm[, 1:4] Y <- spinifex::penguins_na.rm$species clas <- spinifex::penguins_na.rm$species ## Precomputed predictions and shap attribtion str(penguin_xgb_pred) str(penguin_xgb_shap) ## Cheem peng_chm <- cheem_ls(X, Y, penguin_xgb_shap, penguin_xgb_pred, clas, label = "Penguins, xgb, shapviz") ## Save for use with shiny app (expects an rds file) if(FALSE){ ## Don't accidentally save. saveRDS(peng_chm, "./chm_peng_xgb_shapviz.rds") run_app() ## Select the saved rds file from the data dropdown. } ## Cheem visuals if(interactive()){ prim <- 1 comp <- 2 global_view(peng_chm, primary_obs = prim, comparison_obs = comp) bas <- sug_basis(penguin_xgb_shap, prim, comp) mv <- sug_manip_var(penguin_xgb_shap, primary_obs = prim, comp) ggt <- radial_cheem_tour(peng_chm, basis = bas, manip_var = mv) animate_plotly(ggt) }
Whether the Y is a "classification", "regression" or ill-defined problem.
Returns a character: "classification", "regression", or an error for strange
classes. Minor redundancy with is_discrete, though explicit. Could be useful
for DALEX::explain(type)
as it also expects "classification" or
"regression".
problem_type(y)
problem_type(y)
y |
Response variable to be modeled |
Character either c("classification", "regression") specifying the assumed model task based on the discreteness of y.
Other cheem utility:
as_logical_index()
,
color_scale_of()
,
contains_nonnumeric()
,
is_discrete()
,
is_diverging()
,
linear_tform()
,
logistic_tform()
,
rnorm_from()
,
sug_basis()
,
sug_manip_var()
library(cheem) problem_type(mtcars$mpg) ## Numeric, many values problem_type(mtcars$cyl) ## Numeric column, labeled as discrete, because less than 25 unique values problem_type(letters) ## Character to classification problem_type(factor(letters)) ## Factor to classification
library(cheem) problem_type(mtcars$mpg) ## Numeric, many values problem_type(mtcars$cyl) ## Numeric column, labeled as discrete, because less than 25 unique values problem_type(letters) ## Character to classification problem_type(factor(letters)) ## Factor to classification
A spinifex
proto_*-like function, that adds the distribution of
orthonormalized row values of the specified local explanation attr_df
.
Does not draw the basis bars; use in conjunction with proto_basis1d()
.
proto_basis1d_distribution( attr_df, primary_obs = NULL, comparison_obs = NULL, position = c("top1d", "floor1d", "bottom1d", "off"), group_by = as.factor(FALSE), pcp_shape = c(3, 142, 124), do_add_pcp_segments = TRUE, inc_var_nms = NULL, row_index = NULL )
proto_basis1d_distribution( attr_df, primary_obs = NULL, comparison_obs = NULL, position = c("top1d", "floor1d", "bottom1d", "off"), group_by = as.factor(FALSE), pcp_shape = c(3, 142, 124), do_add_pcp_segments = TRUE, inc_var_nms = NULL, row_index = NULL )
attr_df |
An data frame, the attributions of a local explanation. |
primary_obs |
The rownumber of the primary observation. Its local attribution becomes the 1d projection basis, and the point it highlighted as a dashed line. Defaults to NULL, no highlighting. |
comparison_obs |
The rownumber of the comparison observation. Point is highlighted as a dotted line. Defaults to NULL, no highlighting. |
position |
The position for the basis, one of: c("top1d", "floor1d", "bottom1d", "off"). Defaults to "top1d"; basis above the density curves. |
group_by |
Vector to group densities by. Originally predicted class. |
pcp_shape |
The number of the shape character to add. Expects
3, 142, or 124, '+', '|' in |
do_add_pcp_segments |
Logical, whether or not to add to add faint parallel coordinate lines on the 1D basis. Defaults to TRUE. |
inc_var_nms |
A character vector, the names of the variables to keep. Defaults to NULL, all variables kept. |
row_index |
A numeric or logical vector, the index of the rows to keep. Defaults to NULL, all rows kept. |
A ggplot
object of the the distribution of the local explanation's
attributions.
library(cheem) library(spinifex) ## Regression setup dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## Basis, manipulation var, manual tour path, & predictions to fix to y-axis bas <- sug_basis(ames_rf_shap, 1) mv <- sug_manip_var(ames_rf_shap, 1, 2) mt_path <- manual_tour(bas, mv) ## Also consumed by: ?radial_cheem_tour() ## Compose and animate the tour ggt <- ggtour(mt_path, scale_sd(X), angle = .3) + append_fixed_y(fixed_y = scale_sd(Y)) + proto_point(list(color = clas, shape = clas)) + proto_basis1d_distribution( attr_df = ames_rf_shap, primary_obs = 1, comparison_obs = 2, position = "top1d", group_by = clas) + proto_basis1d(position = "bottom1d") + proto_origin() if(interactive()){ animate_plotly(ggt) }
library(cheem) library(spinifex) ## Regression setup dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## Basis, manipulation var, manual tour path, & predictions to fix to y-axis bas <- sug_basis(ames_rf_shap, 1) mv <- sug_manip_var(ames_rf_shap, 1, 2) mt_path <- manual_tour(bas, mv) ## Also consumed by: ?radial_cheem_tour() ## Compose and animate the tour ggt <- ggtour(mt_path, scale_sd(X), angle = .3) + append_fixed_y(fixed_y = scale_sd(Y)) + proto_point(list(color = clas, shape = clas)) + proto_basis1d_distribution( attr_df = ames_rf_shap, primary_obs = 1, comparison_obs = 2, position = "top1d", group_by = clas) + proto_basis1d(position = "bottom1d") + proto_origin() if(interactive()){ animate_plotly(ggt) }
Create a linked plotly
of the global data-
and attribution- spaces. Typically consumed directly by shiny app.
radial_cheem_tour( cheem_ls, basis, manip_var, primary_obs = NULL, comparison_obs = NULL, do_add_pcp_segments = TRUE, pcp_shape = c(3, 142, 124), angle = 0.15, row_index = NULL, inc_var_nms = NULL, do_center_frame = TRUE, do_add_residual = FALSE )
radial_cheem_tour( cheem_ls, basis, manip_var, primary_obs = NULL, comparison_obs = NULL, do_add_pcp_segments = TRUE, pcp_shape = c(3, 142, 124), angle = 0.15, row_index = NULL, inc_var_nms = NULL, do_center_frame = TRUE, do_add_residual = FALSE )
cheem_ls |
A return from |
basis |
A 1D projection basis, typically a return of
|
manip_var |
The , number of the manipulation variable. |
primary_obs |
The rownumber of the primary observation. Its local attribution becomes the 1d projection basis, and the point it highlighted as a dashed line. Defaults to NULL, no primary observation highlighted. |
comparison_obs |
The rownumber of the comparison observation. Point is highlighted as a dotted line. Defaults to NULL, no comparison observation highlighted. |
do_add_pcp_segments |
Logical, whether or not to add parallel coordinate line segments to the basis display. |
pcp_shape |
The number of the shape character to add. Expects
3, 142, or 124, '+', '|' in |
angle |
The step size between interpolated frames, in radians. Defaults to .15. |
row_index |
Numeric index of selected observations. Defaults to TRUE; 1:n. |
inc_var_nms |
A vector of the names of the variables to include in the projection. |
do_center_frame |
Whether or not to scale by standard deviations away from the mean within each frame or not. Defaults to TRUE, helping to keep the animation centered. |
do_add_residual |
Whether of not to add a facet with a fixed y on residual. Doing so may cause issues with animation. Defaults to FALSE. |
ggtour (ggplot2
object with frame info) animation frames of a
radial tour manipulating the contribution of a selected tour. Animated with
spinifex::animate_*
functions.
Other cheem consumers:
global_view_legwork()
,
global_view()
,
run_app()
library(cheem) ## Regression setup: dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## radial_cheem_tour() ames_rf_chm <- cheem_ls(X, Y, ames_rf_shap, ames_rf_pred, clas, label = "North Ames, RF, SHAP") bas <- sug_basis(ames_rf_shap, 1) mv <- sug_manip_var(ames_rf_shap, 1, 2) ggt <- radial_cheem_tour(ames_rf_chm, basis = bas, manip_var = mv) if(interactive()){ ## As a plotly html widget spinifex::animate_plotly(ggt) ## As a gganimation spinifex::animate_gganimate(ggt, render = gganimate::av_renderer()) } ## Also used in: ?run_app()
library(cheem) ## Regression setup: dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## radial_cheem_tour() ames_rf_chm <- cheem_ls(X, Y, ames_rf_shap, ames_rf_pred, clas, label = "North Ames, RF, SHAP") bas <- sug_basis(ames_rf_shap, 1) mv <- sug_manip_var(ames_rf_shap, 1, 2) ggt <- radial_cheem_tour(ames_rf_chm, basis = bas, manip_var = mv) if(interactive()){ ## As a plotly html widget spinifex::animate_plotly(ggt) ## As a gganimation spinifex::animate_gganimate(ggt, render = gganimate::av_renderer()) } ## Also used in: ?run_app()
Creates new observation of the data given its specific means and shapes. typically applied to a cluster subset of data. ie draw from cluster 'a', then assign to cluster 'b'.
rnorm_from( data, n_obs = 1, var_coeff = 1, method = c("pearson", "kendall", "spearman") )
rnorm_from( data, n_obs = 1, var_coeff = 1, method = c("pearson", "kendall", "spearman") )
data |
A data.frame or matrix to sample from. |
n_obs |
Number of new observations to draw. Defaults to 1. |
var_coeff |
Variance coefficient, closer to 0 make points near the median, above 1 makes more points further away from the median. Defaults to 1. |
method |
The method of the covariance matrix. Expects "person" (continuous numeric), "kendall" or "spearman" (latter two are ranked based ordinal). |
A data.frame, sampled observations given the means and covariance of the data based on with column names kept.
Other cheem utility:
as_logical_index()
,
color_scale_of()
,
contains_nonnumeric()
,
is_discrete()
,
is_diverging()
,
linear_tform()
,
logistic_tform()
,
problem_type()
,
sug_basis()
,
sug_manip_var()
library(cheem) sub <- mtcars[mtcars$cyl == 6, ] ## Draw 3 new observations in the shape of 6 cylinder vehicles, with reduced variance. rnorm_from(data = sub, n_obs = 3, var_coeff = .5)
library(cheem) sub <- mtcars[mtcars$cyl == 6, ] ## Draw 3 new observations in the shape of 6 cylinder vehicles, with reduced variance. rnorm_from(data = sub, n_obs = 3, var_coeff = .5)
Runs a local shiny app that demonstrates manual tour and comparable traditional techniques for static projections of multivariate data sets.
run_app(app_nm = "cheem", ...)
run_app(app_nm = "cheem", ...)
app_nm |
Name of the shiny app to run. Expects "cheem_initial". |
... |
Other arguments passed into |
Runs a locally hosted shiny app.
Other cheem consumers:
global_view_legwork()
,
global_view()
,
radial_cheem_tour()
## Only run this example in interactive R sessions if(interactive()){ ## Runs the app run_app("cheem") ## Run with app code displayed run_app(app_nm = "cheem", display.mode = "showcase") }
## Only run this example in interactive R sessions if(interactive()){ ## Runs the app run_app("cheem") ## Run with app code displayed run_app(app_nm = "cheem", display.mode = "showcase") }
Given a numerical index of rownumbers of the original data, subset the correct elements of a cheem list.
subset_cheem(cheem_ls, rownumbers = 1:500)
subset_cheem(cheem_ls, rownumbers = 1:500)
cheem_ls |
The return of a |
rownumbers |
A vector of the numeric index of rownumbers to keep.
To use a logical index, pass it to |
A subset of the supplied cheem_ls.
Other cheem preprocessing:
cheem_ls()
library(cheem) ## Classification setup X <- spinifex::penguins_na.rm[, 1:4] Y <- spinifex::penguins_na.rm$species clas <- spinifex::penguins_na.rm$species ## Cheem peng_chm <- cheem_ls(X, Y, penguin_xgb_shap, penguin_xgb_pred, clas, label = "Penguins, xgb, shapviz") lapply(peng_chm, NROW) ## subset a cheem list num_idx <- which(spinifex::penguins_na.rm$flipper_length_mm > 185) peng_chm_sub <- subset_cheem(peng_chm, num_idx) lapply(peng_chm_sub, NROW) ## Notice that $global_view_df and $decode_df have fewer rows.
library(cheem) ## Classification setup X <- spinifex::penguins_na.rm[, 1:4] Y <- spinifex::penguins_na.rm$species clas <- spinifex::penguins_na.rm$species ## Cheem peng_chm <- cheem_ls(X, Y, penguin_xgb_shap, penguin_xgb_pred, clas, label = "Penguins, xgb, shapviz") lapply(peng_chm, NROW) ## subset a cheem list num_idx <- which(spinifex::penguins_na.rm$flipper_length_mm > 185) peng_chm_sub <- subset_cheem(peng_chm, num_idx) lapply(peng_chm_sub, NROW) ## Notice that $global_view_df and $decode_df have fewer rows.
Extract and formats the 1D attribution basis from the provided local explanation.
sug_basis(attr_df, rownum)
sug_basis(attr_df, rownum)
attr_df |
A data frame of local explanation attributions. |
rownum |
The rownumber of the observation. Typically primary or comparison observations. |
A matrix of the 1D basis.
Other cheem utility:
as_logical_index()
,
color_scale_of()
,
contains_nonnumeric()
,
is_discrete()
,
is_diverging()
,
linear_tform()
,
logistic_tform()
,
problem_type()
,
rnorm_from()
,
sug_manip_var()
library(cheem) ## Attribution basis of the primary instance sug_basis(ames_rf_shap, rownum = 1) ## This can be used to find a basis to start the radial tour. # ?radial_cheem_tour ## Regression setup: dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## radial_cheem_tour() ames_rf_chm <- cheem_ls(X, Y, ames_rf_shap, ames_rf_pred, clas, label = "North Ames, RF, SHAP") bas <- sug_basis(ames_rf_shap, 1) mv <- sug_manip_var(ames_rf_shap, 1, 2) ggt <- radial_cheem_tour(ames_rf_chm, basis = bas, manip_var = mv) if(interactive()){ ## As a plotly html widget spinifex::animate_plotly(ggt) ## As a gganimation spinifex::animate_gganimate(ggt, render = gganimate::av_renderer()) } ## radial_cheem_tour is also used in: ?run_app()
library(cheem) ## Attribution basis of the primary instance sug_basis(ames_rf_shap, rownum = 1) ## This can be used to find a basis to start the radial tour. # ?radial_cheem_tour ## Regression setup: dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## radial_cheem_tour() ames_rf_chm <- cheem_ls(X, Y, ames_rf_shap, ames_rf_pred, clas, label = "North Ames, RF, SHAP") bas <- sug_basis(ames_rf_shap, 1) mv <- sug_manip_var(ames_rf_shap, 1, 2) ggt <- radial_cheem_tour(ames_rf_chm, basis = bas, manip_var = mv) if(interactive()){ ## As a plotly html widget spinifex::animate_plotly(ggt) ## As a gganimation spinifex::animate_gganimate(ggt, render = gganimate::av_renderer()) } ## radial_cheem_tour is also used in: ?run_app()
Suggest the number of the variable with the largest difference in attribution between the primary and comparison observations.
sug_manip_var(attr_df, primary_obs, comparison_obs)
sug_manip_var(attr_df, primary_obs, comparison_obs)
attr_df |
A data frame of local explanation attributions. |
primary_obs |
The rownumber of the primary observation. Its local attribution becomes the 1d projection basis, and the point it highlighted as a dashed line. |
comparison_obs |
The rownumber of the comparison observation. Point is highlighted as a dotted line. |
A single number of the variable with the largest difference of attribution (basis) variables.
Other cheem utility:
as_logical_index()
,
color_scale_of()
,
contains_nonnumeric()
,
is_discrete()
,
is_diverging()
,
linear_tform()
,
logistic_tform()
,
problem_type()
,
rnorm_from()
,
sug_basis()
library(cheem) ## Regression setup dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## Suggest the number of a variable to manipulate sug_manip_var(ames_rf_shap, primary_obs = 1, comparison_obs = 2) ## This can be used to find a basis to start the radial tour. # ?radial_cheem_tour
library(cheem) ## Regression setup dat <- amesHousing2018_NorthAmes X <- dat[, 1:9] Y <- dat$SalePrice clas <- dat$SubclassMS ## Suggest the number of a variable to manipulate sug_manip_var(ames_rf_shap, primary_obs = 1, comparison_obs = 2) ## This can be used to find a basis to start the radial tour. # ?radial_cheem_tour