---
title: "Summary of Regression Models as HTML Table"
author: "Daniel Lüdecke"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Summary of Regression Models as HTML Table}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE)

if (!requireNamespace("sjlabelled", quietly = TRUE) ||
    !requireNamespace("sjmisc", quietly = TRUE) ||
    !requireNamespace("lme4", quietly = TRUE) ||
    !requireNamespace("pscl", quietly = TRUE) ||
    !requireNamespace("glmmTMB", quietly = TRUE)) {
  knitr::opts_chunk$set(eval = FALSE)
} else {
  knitr::opts_chunk$set(eval = TRUE)
  library(sjPlot)
}

```

`tab_model()` is the pendant to `plot_model()`, however, instead of creating plots, `tab_model()` creates HTML-tables that will be displayed either in your IDE's viewer-pane, in a web browser or in a knitr-markdown-document (like this vignette).

HTML is the only output-format, you can't (directly) create a LaTex or PDF output from `tab_model()` and related table-functions. However, it is possible to easily export the tables into Microsoft Word or Libre Office Writer.

This vignette shows how to create table from regression models with `tab_model()`. There's a dedicated vignette that demonstrate how to change the [table layout and appearance with CSS](table_css.html).

**Note!** Due to the custom CSS, the layout of the table inside a knitr-document differs from the output in the viewer-pane and web browser!

```{r}
# load package
library(sjPlot)
library(sjmisc)
library(sjlabelled)

# sample data
data("efc")
efc <- as_factor(efc, c161sex, c172code)
```

## A simple HTML table from regression results

First, we fit two linear models to demonstrate the `tab_model()`-function.

```{r, results='hide'}
m1 <- lm(barthtot ~ c160age + c12hour + c161sex + c172code, data = efc)
m2 <- lm(neg_c_7 ~ c160age + c12hour + c161sex + e17age, data = efc)
``` 

The simplest way of producing the table output is by passing the fitted model as parameter. By default, estimates, confidence intervals (_CI_) and p-values (_p_) are reported. As summary, the numbers of observations as well as the R-squared values are shown.

```{r}
tab_model(m1)
```

## Automatic labelling

As the **sjPlot**-packages features [labelled data](https://strengejacke.github.io/sjlabelled/), the coefficients in the table are already labelled in this example. The name of the dependent variable(s) is used as main column header for each model. For non-labelled data, the coefficient names are shown.

```{r}
data(mtcars)
m.mtcars <- lm(mpg ~ cyl + hp + wt, data = mtcars)
tab_model(m.mtcars)
```

If factors are involved and `auto.label = TRUE`, "pretty" parameters names are used (see [`format_parameters()`](https://easystats.github.io/parameters/reference/format_parameters.html).

```{r}
set.seed(2)
dat <- data.frame(
  y = runif(100, 0, 100),
  drug = as.factor(sample(c("nonsense", "useful", "placebo"), 100, TRUE)),
  group = as.factor(sample(c("control", "treatment"), 100, TRUE))
)

pretty_names <- lm(y ~ drug * group, data = dat)
tab_model(pretty_names)
```

### Turn off automatic labelling

To turn off automatic labelling, use `auto.label = FALSE`, or provide an empty character vector for `pred.labels` and `dv.labels`.

```{r}
tab_model(m1, auto.label = FALSE)
```

Same for models with non-labelled data and factors.

```{r}
tab_model(pretty_names, auto.label = FALSE)
```

## More than one model

`tab_model()` can print multiple models at once, which are then printed side-by-side. Identical coefficients are matched in a row.

```{r}
tab_model(m1, m2)
```

## Generalized linear models

For generalized linear models, the ouput is slightly adapted. Instead of _Estimates_, the column is named _Odds Ratios_, _Incidence Rate Ratios_ etc., depending on the model. The coefficients are in this case automatically converted (exponentiated). Furthermore, pseudo R-squared statistics are shown in the summary.

```{r}
m3 <- glm(
  tot_sc_e ~ c160age + c12hour + c161sex + c172code, 
  data = efc,
  family = poisson(link = "log")
)

efc$neg_c_7d <- ifelse(efc$neg_c_7 < median(efc$neg_c_7, na.rm = TRUE), 0, 1)
m4 <- glm(
  neg_c_7d ~ c161sex + barthtot + c172code,
  data = efc,
  family = binomial(link = "logit")
)

tab_model(m3, m4)
``` 

### Untransformed estimates on the linear scale

To plot the estimates on the linear scale, use `transform = NULL`. 

```{r}
tab_model(m3, m4, transform = NULL, auto.label = FALSE)
``` 

## More complex models

Other models, like hurdle- or zero-inflated models, also work with `tab_model()`. In this case, the zero inflation model is indicated in the table. Use `show.zeroinf = FALSE` to hide this part from the table.

```{r}
library(pscl)
data("bioChemists")
m5 <- zeroinfl(art ~ fem + mar + kid5 + ment | kid5 + phd + ment, data = bioChemists)

tab_model(m5)
```

You can combine any model in one table.

```{r}
tab_model(m1, m3, m5, auto.label = FALSE, show.ci = FALSE)
```

## Show or hide further columns

`tab_model()` has some argument that allow to show or hide specific columns from the output:

* `show.est` to show/hide the column with model estimates.
* `show.ci` to show/hide the column with confidence intervals.
* `show.se` to show/hide the column with standard errors.
* `show.std` to show/hide the column with standardized estimates (and their standard errors).
* `show.p` to show/hide the column with p-values.
* `show.stat` to show/hide the column with the coefficients' test statistics.
* `show.df` for linear mixed models, when p-values are based on degrees of freedom with Kenward-Rogers approximation, these degrees of freedom are shown.

### Adding columns

In the following example, standard errors, standardized coefficients and test statistics are also shown.

```{r}
tab_model(m1, show.se = TRUE, show.std = TRUE, show.stat = TRUE)
``` 

### Removing columns

In the following example, default columns are removed.

```{r}
tab_model(m3, m4, show.ci = FALSE, show.p = FALSE, auto.label = FALSE)
``` 

### Removing and sorting columns

Another way to remove columns, which also allows to reorder the columns, is the `col.order`-argument. This is a character vector, where each element indicates a column in the output. The value `"est"`, for instance, indicates the estimates, while `"std.est"` is the column for standardized estimates and so on.

By default, `col.order` contains all possible columns. All columns that should shown (see previous tables, for example using `show.se = TRUE` to show standard errors, or `show.st = TRUE` to show standardized estimates) are then printed by default. Colums that are _excluded_ from `col.order` are _not shown_, no matter if the `show*`-arguments are `TRUE` or `FALSE`. So if `show.se = TRUE`, but`col.order` does not contain the element `"se"`, standard errors are not shown. On the other hand, if `show.est = FALSE`, but `col.order` _does include_ the element `"est"`, the columns with estimates are not shown.

In summary, `col.order` can be used to _exclude_ columns from the table and to change the order of colums.

```{r}
tab_model(
  m1, show.se = TRUE, show.std = TRUE, show.stat = TRUE,
  col.order = c("p", "stat", "est", "std.se", "se", "std.est")
)
``` 

### Collapsing columns

With `collapse.ci` and `collapse.se`, the columns for confidence intervals and standard errors can be collapsed into one column together with the estimates. Sometimes this table layout is required.

```{r}
tab_model(m1, collapse.ci = TRUE)
``` 

## Defining own labels

There are different options to change the labels of the column headers or coefficients, e.g. with:

* `pred.labels` to change the names of the coefficients in the _Predictors_ column. Note that the length of `pred.labels` must exactly match the amount of predictors in the _Predictor_ column.
* `dv.labels` to change the names of the model columns, which are labelled with the variable labels / names from the dependent variables.
* Further more, there are various `string.*`-arguments, to change the name of column headings.

```{r}
tab_model(
  m1, m2, 
  pred.labels = c("Intercept", "Age (Carer)", "Hours per Week", "Gender (Carer)",
                  "Education: middle (Carer)", "Education: high (Carer)", 
                  "Age (Older Person)"),
  dv.labels = c("First Model", "M2"),
  string.pred = "Coeffcient",
  string.ci = "Conf. Int (95%)",
  string.p = "P-Value"
)
``` 

## Including reference level of categorical predictors

By default, for categorical predictors, the variable names and the categories for regression coefficients are shown in the table output. 

```{r}
library(glmmTMB)
data("Salamanders")
model <- glm(
  count ~ spp + Wtemp + mined + cover,
  family = poisson(),
  data = Salamanders
)

tab_model(model)
```

You can include the reference level for categorical predictors by setting `show.reflvl = TRUE`.

```{r}
tab_model(model, show.reflvl = TRUE)
```

To show variable names, categories and include the reference level, also set `prefix.labels = "varname"`.

```{r}
tab_model(model, show.reflvl = TRUE, prefix.labels = "varname")
```

## Style of p-values

You can change the style of how p-values are displayed with the argument `p.style`. With `p.style = "stars"`, the p-values are indicated as `*` in the table. 

```{r}
tab_model(m1, m2, p.style = "stars")
``` 

Another option would be scientific notation, using `p.style = "scientific"`, which also can be combined with `digits.p`.

```{r}
tab_model(m1, m2, p.style = "scientific", digits.p = 2)
``` 


### Automatic matching for named vectors

Another way to easily assign labels are _named vectors_. In this case, it doesn't matter if `pred.labels` has more labels than coefficients in the model(s), or in which order the labels are passed to `tab_model()`. The only requirement is that the labels' names equal the coefficients names as they appear in the `summary()`-output.

```{r}
# example, coefficients are "c161sex2" or "c172code3"
summary(m1)

pl <- c(
  `(Intercept)` = "Intercept",
  e17age = "Age (Older Person)",
  c160age = "Age (Carer)", 
  c12hour = "Hours per Week", 
  barthtot = "Barthel-Index",
  c161sex2 = "Gender (Carer)",
  c172code2 = "Education: middle (Carer)", 
  c172code3 = "Education: high (Carer)",
  a_non_used_label = "We don't care"
)
 
tab_model(
  m1, m2, m3, m4, 
  pred.labels = pl, 
  dv.labels = c("Model1", "Model2", "Model3", "Model4"),
  show.ci = FALSE, 
  show.p = FALSE, 
  transform = NULL
)
``` 

## Keep or remove coefficients from the table

Using the `terms`- or `rm.terms`-argument allows us to explicitly show or remove specific coefficients from the table output.

```{r}
tab_model(m1, terms = c("c160age", "c12hour"))
``` 

Note that the names of terms to keep or remove should match the coefficients names. For categorical predictors, one example would be:

```{r}
tab_model(m1, rm.terms = c("c172code2", "c161sex2"))
```