--- title: "Introduction to crandep" date: "2023-08-17" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to crandep} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- This vignette provides an introduction to the functions facilitating the analysis of the dependencies of CRAN packages, specifically `get_dep()` and `df_to_graph()`. ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, message = FALSE} library(crandep) library(dplyr) library(igraph) ``` ## One or multiple types of dependencies To obtain the information about various kinds of dependencies of a package, we can use the function `get_dep()` which takes the package name and the type of dependencies as the first and second arguments, respectively. Currently, the second argument accepts a character vector of one or more of the following words: `Depends`, `Imports`, `LinkingTo`, `Suggests`, `Enhances`, or any variations in their letter cases, or if `LinkingTo` is written as `Linking_To` or `Linking To`. ```{r} get_dep("dplyr", "Imports") get_dep("MASS", c("depends", "suggests")) ``` For more information on different types of dependencies, see [the official guidelines](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-Dependencies) and [https://r-pkgs.org/description.html](https://r-pkgs.org/description.html). In the output, the column `type` is the type of the dependency converted to lower case. Also, `LinkingTo` is now converted to `linking to` for consistency. ```{r} get_dep("xts", "LinkingTo") get_dep("xts", "linking to") ``` For the reverse dependencies, instead of including the prefix "Reverse " in `type`, we use the argument `reverse`: ```{r} get_dep("abc", c("depends", "depends"), reverse = TRUE) get_dep("xts", c("linking to", "linking to"), reverse = TRUE) ``` Theoretically, for each forward dependency ```{r, echo=FALSE} data.frame(from = "A", to = "B", type = "c", reverse = FALSE) ``` there should be an equivalent reverse dependency ```{r, echo=FALSE} data.frame(from = "B", to = "A", type = "c", reverse = TRUE) ``` Aligning the `type` in the forward and reverse dependencies enables this to be checked easily. To obtain all types of dependencies, we can use `"all"` in the second argument, instead of typing a character vector of all 5 words: ```{r} df0.rstan <- get_dep("rstan", "all") dplyr::count(df0.rstan, type) df1.rstan <- get_dep("rstan", "all", reverse = TRUE) # too many rows to display dplyr::count(df1.rstan, type) # hence the summary using count() ``` ## Building and visualising a dependency network To build a dependency network, we have to obtain the dependencies for multiple packages. For illustration, we choose the [core packages of the tidyverse](https://www.tidyverse.org/packages/), and find out what each package `Imports`. We put all the dependencies into one data frame, in which the package in the `from` column imports the package in the `to` column. This is essentially the edge list of the dependency network. ```{r} df0.imports <- rbind( get_dep("ggplot2", "Imports"), get_dep("dplyr", "Imports"), get_dep("tidyr", "Imports"), get_dep("readr", "Imports"), get_dep("purrr", "Imports"), get_dep("tibble", "Imports"), get_dep("stringr", "Imports"), get_dep("forcats", "Imports") ) head(df0.imports) tail(df0.imports) ``` With the help of the 'igraph' package, we can use this data frame to build a graph object that represents the dependency network. ```{r, out.width="660px", out.height="660px", fig.width=12, fig.height=12, fig.show="hold"} g0.imports <- igraph::graph_from_data_frame(df0.imports) set.seed(1457L) old.par <- par(mar = rep(0.0, 4)) plot(g0.imports, vertex.label.cex = 1.5) par(old.par) ``` The nature of a dependency network makes it a directed acyclic graph (DAG). We can use the 'igraph' function `is_dag()` to check. ```{r} igraph::is_dag(g0.imports) ``` Note that this applies to `Imports` (and `Depends`) only due to their nature. This acyclic nature does not apply to a network of, for example, `Suggests`. ## Boundary and giant component It is possible to set a boundary on the nodes to which the edges are directed, using the function `df_to_graph()`. The second argument takes in a data frame that contains the list of such nodes in the column `name`. ```{r, out.width="660px", out.height="660px", fig.width=12, fig.height=12, fig.show="hold"} df0.nodes <- data.frame( name = c("ggplot2", "dplyr", "tidyr", "readr", "purrr", "tibble", "stringr", "forcats"), stringsAsFactors = FALSE ) g0.core <- df_to_graph(df0.imports, df0.nodes) set.seed(259L) old.par <- par(mar = rep(0.0, 4)) plot(g0.core, vertex.label.cex = 1.5) par(old.par) ``` ## Going forward In [this other vignette](cran.html), we show how to obtain the dependency network of **all** CRAN packages using other functions in the package. The number of reverse dependencies can then be [modelled](degree.html).