--- title: "ROC curves with cutpointr" author: "Christian Thiele" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{ROC curves with cutpointr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(fig.width = 6, fig.height = 5, fig.align = "center") options(rmarkdown.html_vignette.check_title = FALSE) load("vignettedata/vignettedata.Rdata") ``` ## Calculating only the ROC curve When running `cutpointr`, a ROC curve is by default returned in the column `roc_curve`. This ROC curve can be plotted using `plot_roc`. Alternatively, if only the ROC curve is desired and no cutpoint needs to be calculated, the ROC curve can be created using `roc()` and plotted using `plot_cutpointr`. The `roc` function, unlike `cutpointr`, does not determine `direction`, `pos_class` or `neg_class` automatically. ```{r, fig.width=4, fig.height=3} library(cutpointr) roc_curve <- roc(data = suicide, x = dsi, class = suicide, pos_class = "yes", neg_class = "no", direction = ">=") auc(roc_curve) head(roc_curve) plot_roc(roc_curve) ``` ## ROC curve and optimal cutpoint for multiple variables Alternatively, we can map the standard evaluation version `cutpointr` to the column names. If `direction` and / or `pos_class` and `neg_class` are unspecified, these parameters will automatically be determined by **cutpointr** so that the AUC values for all variables will be $> 0.5$. We could do this manually, e.g. using `purrr::map`, but to make this task more convenient `multi_cutpointr` can be used to achieve the same result. It maps multiple predictor columns to `cutpointr`, by default all numeric columns except for the class column. ```{r} mcp <- multi_cutpointr(suicide, class = suicide, pos_class = "yes", use_midpoints = TRUE, silent = TRUE) summary(mcp) ``` ## Accessing `data`, `roc_curve`, and `boot` The object returned by `cutpointr` is of the classes `cutpointr`, `tbl_df`, `tbl`, and `data.frame`. Thus, it can be handled like a usual data frame. The columns `data`, `roc_curve`, and `boot` consist of nested data frames, which means that these are list columns whose elements are data frames. They can either be accessed using `[` or by using functions from the tidyverse. If subgroups were given, the output contains one row per subgroup and the function that accesses the data should be mapped to every row or the data should be grouped by subgroup. ```{r, eval = FALSE, message = FALSE} set.seed(123) opt_cut_b_g <- cutpointr(suicide, dsi, suicide, gender, boot_runs = 500) ``` ```{r, message = FALSE} library(dplyr) library(tidyr) opt_cut_b_g |> group_by(subgroup) |> select(subgroup, boot) |> unnest(cols = boot) |> summarise(sd_oc_boot = sd(optimal_cutpoint), m_oc_boot = mean(optimal_cutpoint), m_acc_oob = mean(acc_oob)) ``` ## Adding metrics to the result of cutpointr() or roc() By default, the output of `cutpointr` includes the optimized metric and several other metrics. The `add_metric` function adds further metrics. Here, we're adding the negative predictive value (NPV) and the positive predictive value (PPV) at the optimal cutpoint per subgroup: ```{r} cutpointr(suicide, dsi, suicide, gender, metric = youden, silent = TRUE) |> add_metric(list(ppv, npv)) |> select(subgroup, optimal_cutpoint, youden, ppv, npv) ``` In the same fashion, additional metric columns can be added to a `roc_cutpointr` object: ```{r} roc(data = suicide, x = dsi, class = suicide, pos_class = "yes", neg_class = "no", direction = ">=") |> add_metric(list(cohens_kappa, F1_score)) |> select(x.sorted, tp, fp, tn, fn, cohens_kappa, F1_score) |> head() ```