--- title: "Introduction to Fluxtools" author: "Kesondra Key" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to Fluxtools} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup} library(fluxtools) ``` # Overview **fluxtools** is an R package that provides an interactive Shiny‐based QA/QC environment to explore or remove data in the AmeriFlux BASE (or Fluxnet) format. In just a few clicks, you can: 1. Upload eddy covariance data in a .csv format (AmeriFlux standard naming and timestamp conventions) 2. Visualize any two numeric columns against time (or each other) 3. Highlight statistical outliers (±σ from a linear fit) and add them to your point-removal R code 4. Manually select and remove data points via a lasso or box. Selecting these adds to the accumulated removal code 6. Copy and paste the generated code into your own R script for reproducible QA/QC 7. Download a “cleaned” CSV with excluded values (using "apply removals") set to `NA` and an R script for reproducibility This vignette shows you how to install, launch, and use the main Shiny app—`run_flux_qaqc()`—and walks through a typical workflow. --- # Installation You can install **fluxtools** from CRAN, or directly from GitHub: ```{r, eval=FALSE} # Install from CRAN install.packages("fluxtools") # Install from GitHub library(devtools) devtools::install_github("kesondrakey/fluxtools") ``` # Launching the Shiny App Load **fluxtools** and launch the QA/QC application: ```{r, eval=FALSE} library(fluxtools) # Run the app run_fluxtools() ``` Example workflow 1. **Upload**: Select your AmeriFlux-style CSV (e.g., `US_VT1_HH_202401010000_202501010000.csv`). Files can be up to 1GB (larger file sizes might be harder on the Shiny interface) 2. **Choose Year(s)**: By default “all” is selected, but you can subset to specific years 3. **Choose variables**: `TIMESTAMP_START` is on the x-axis by default. Change the y-axis to your variable of interest (e.g., `FC_1_1_1`). The generated R code focuses on removing the y-axis variable 4. **Select data**: Use the box or lasso to select points. This populates the “Current” code box with something like: ```{r, eval=FALSE} df <- df %>% mutate( FC_1_1_1 = case_when( TIMESTAMP_START == '202401261830' ~ NA_real_, TIMESTAMP_START == '202401270530' ~ NA_real_, … TRUE ~ FC_1_1_1 ) ) ``` 5. **Flag data and Accumulate code**: With points still selected, click “Flag data.” Selected points turn orange, and code is appended to the “Accumulated” box, allowing multiple selections per session. 6. **Unflag data**: Use the box or lasso to de-select points and remove from the Accumulated code box. 7. **Clear Selection**: To reset all selections from the current y-variable, click "Clear Selection" to reset the current view. 8. **Switch variables**: Change y to any other variable (e.g., `SWC_1_1_1`) and select more points. Click “Flag data” Code for both variables to appear: ```{r, eval=FALSE} df <- df %>% mutate( FC_1_1_1 = case_when( TIMESTAMP_START == '202401261830' ~ NA_real_, TIMESTAMP_START == '202401270530' ~ NA_real_, … TRUE ~ FC_1_1_1 ) ) df <- df %>% mutate( SWC_1_1_1 = case_when( TIMESTAMP_START == '202403261130' ~ NA_real_, TIMESTAMP_START == '202403270800' ~ NA_real_, … TRUE ~ SWC_1_1_1 ) ) ``` 9. **Compare variables**: Change to variables you would like to compare (e.g., change y to `TA_1_1_1` and x to `T_SONIC_1_1_1`). The app computes an R² via simple linear regression. The top R² is based on points before removals, and once data is selected, a second R² will pop up - calculating the linear regression assuming the selected points have been removed 10. **Highlight outliers**: Use the slider to select ±σ residuals. Click “Select all ±σ outliers” to append them to the Accumulated code. Click “Clear ±σ outliers” to deselect and remove from the code box 11. **Copy all**: Click the Copy Icon to the right of the current or accumulated code box and paste into your own R script for documentation 12. **Apply Removals**: Click “Apply Removals” to remove each selected data points, from the current y-variable, to replace points with `NA` in a new .csv (raw data is unaffected), available using 'export cleaned data' and remove these values from view 13. **Reload original data**: Make a mistake or want a fresh start? Click Reload original data to reload the .csv from above to start over 14. **Export cleaned data**: Download a ZIP containing: - A cleaned CSV (with applied NAs) - An R script that reproduces your removals - Optional PRM summary when used (see below) # Physical Boundary Module (PRM) function: The **Physical Range Module (PRM)** removes out-of-range values to `NA` based on similar **variables** using patterns like `^SWC($|_)` or `^P($|_)`. Columns containing `"QC"` are skipped by default. No columns are removed. Source of ranges: *AmeriFlux Technical Documents, Table A1 (Physical Range Module)*. ## Quick start ```{r, eval=FALSE} # tiny demo dataset with a few out-of-range values set.seed(1) df <- tibble::tibble( TIMESTAMP_START = seq.POSIXt(as.POSIXct("2024-01-01", tz = "UTC"), length.out = 10, by = "30 min"), SWC_1_1_1 = c(10, 20, 150, NA, 0.5, 99, 101, 50, 80, -3), # bad: 150, 101, -3; 0.5 triggers SWC unit note P = c(0, 10, 60, NA, 51, 3, 0, 5, 100, -1), # bad: 60, 51, 100, -1 RH_1_1_1 = c(10, 110, 50, NA, 0, 100, -5, 101, 75, 30), # bad: 110, -5, 101 SWC_QC = sample(0:2, 10, replace = TRUE) # QC col should be ignored ) # To see the Physical Boundary Module (PRM) rules: get_prm_rules() #Apply filter to all relevant variables res <- apply_prm(df) # PRM summary (counts and % replaced per column) res$summary # Only set range for SWC df_filtered_swc <- apply_prm(df, include = "SWC") # Only set range for SWC + P df_filtered_swc_P <- apply_prm(df, include = c("SWC", "P")) ``` ## Physical Range Module Values ```{r prm_rules_table_final, echo=FALSE, message=FALSE, warning=FALSE, results='asis'} # Force kable to emit HTML old <- options(knitr.table.format = "html"); on.exit(options(old), add = TRUE) tbl <- fluxtools::get_prm_rules() # Drop the regex column that causes the | escaping mess tbl <- tbl[, c("variable", "min", "max", "description", "units")] names(tbl) <- c("Variable", "Min", "Max", "Description", "Units") # Plain HTML table from knitr (no extra packages) knitr::kable(tbl, format = "html", escape = TRUE) ``` *Fluxtools is an independent project and is not affiliated with or endorsed by the AmeriFlux Network. “AmeriFlux” is a registered trademark of Lawrence Berkeley National Laboratory and is used here for identification purposes only.*