| Type: | Package | 
| Title: | Nonlinear Network, Clustering, and Variable Selection Based on DCOL | 
| Version: | 1.4 | 
| Date: | 2020-10-16 | 
| Author: | Tianwei Yu, Haodong Liu | 
| Maintainer: | Tianwei Yu<yutianwei@cuhk.edu.cn> | 
| Description: | It includes four methods: DCOL-based K-profiles clustering, non-linear network reconstruction, non-linear hierarchical clustering, and variable selection for generalized additive model. References: Tianwei Yu (2018)<doi:10.1002/sam.11381>; Haodong Liu and others (2016)<doi:10.1371/journal.pone.0158247>; Kai Wang and others (2015)<doi:10.1155/2015/918954>; Tianwei Yu and others (2010)<doi:10.1109/TCBB.2010.73>. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| Imports: | ROCR, TSP, igraph, fdrtool, coin, methods, graphics, stats, earth, randomForest, e1071 | 
| NeedsCompilation: | no | 
| Packaged: | 2020-10-17 04:39:26 UTC; tyu8 | 
| Repository: | CRAN | 
| Date/Publication: | 2020-10-19 23:10:05 UTC | 
implementation of K-Profiles Clustering
Description
implementation of K-Profiles Clustering
Usage
 KPC(dataset, nCluster, maxIter = 100, p.max = 0.2, p.min = 0.05)
Arguments
dataset | 
 the data matrix with genes in the row and samples in the column  | 
nCluster | 
 the number of clusters K  | 
maxIter | 
 the maximum number of iterations  | 
p.max | 
 the starting p-value cutoff to exclude noise genes  | 
p.min | 
 the final p-value cutoff to exclude noise genes  | 
Value
Return a list about gene cluster and the list of value p
cluster | 
 gene cluster  | 
p.list | 
 a list of value p  | 
Author(s)
Tianwei Yu <tianwei.yu@emory.edu>
References
http://www.hindawi.com/journals/bmri/aa/918954/
See Also
Examples
 ## generating the data matrix & hiden clusters as a sample
 input<-data.gen(n.genes=40, n.grps=4)
 ## now input includes data matrix and hiden clusters, so get the matrix as input.
 input<-input$data
 
 ## set nCluster value to 4
 kpc<-KPC(input,nCluster=4)
  
 ##get the hiden cluster result from "KPC"
 cluster<-kpc$cluster
 ##get the list of p
 p<-kpc$p.list
Simulated Data Generation
Description
Generating gene matrix as a example of input.
Usage
 
data.gen(n.genes=100, n.samples=100, n.grps=10, aver.grp.size=10, 
n.fun.types=6, epsilon=0.1, n.depend=0)
Arguments
n.genes | 
 the number of rows of the matrix.  | 
n.samples | 
 the number of columns of the matrix.  | 
n.grps | 
 the number of hidden clusters.  | 
aver.grp.size | 
 averge number of genes in a cluster.  | 
n.fun.types | 
 number of function types to use.  | 
epsilon | 
 noise level.  | 
n.depend | 
 data generation dependence structure. can be 0, 1, 2.  | 
Details
The data generation scheme is described in detail in IEEE ACM Trans. Comput. Biol. Bioinform. 10(4):1080-85.
Value
return the data including gene and clustering.
data | 
 the gene matrix  | 
grps | 
 the predicted clustering  | 
Author(s)
Tianwei Yu<tyu8@emory.edu>
Examples
##generating a gene matrix with 100 genes, some in 5 clusters, and 100 samples per gene.
output<-data.gen(n.genes=100, n.samples=10, n.grps=5)
##get the gene matrix from the source of data.
matrix<-output$data
##get the hiden clusters from the source of data.
grps<-output$grp
Non-Linear Hierarchical Clustering
Description
The non-linear hierarchical clustering based on DCOL
Usage
nlhc(array, hamil.method = "nn", concorde.path = NA, 
use.normal.approx = FALSE, normalization = "standardize", combine.linear = TRUE,
use.traditional.hclust = FALSE, method.traditional.hclust = "average")
Arguments
array | 
 the data matrix with no missing values  | 
hamil.method | 
 the method to find the hamiltonian path.  | 
concorde.path | 
 If using the Concorde TSP Solver, the local directory of the solver  | 
use.normal.approx | 
 whether to use the normal approximation for the null hypothesis.  | 
normalization | 
 the normalization method for the array.  | 
combine.linear | 
 whether linear association should be found by correlation to combine with nonlinear association found by DCOL.  | 
use.traditional.hclust | 
 whether traditional agglomerative clustering should be used.  | 
method.traditional.hclust | 
 the method to pass on to hclust() if traditional method is chosen.  | 
Details
Hamil.method: It is passed onto the function tsp of library TSP. To use linkern method, the user needs to install concord as instructed in TSP.
use.normal.approx: If TRUE, normal approximation is used for every feature, AND all covariances are assumed to be zero. If FALSE, generates permutation based null distribution - mean vector and a variance-covariance matrix.
normalization: There are three choices - "standardize" means removing the mean of each row and make the standard deviation one; "normal_score" means normal score transformation; "none" means do nothing. In that case we still assume some normalization has been done by the user such that each row has approximately mean 0 and sd 1.
combine.linear: The two pieces of information is combined at the start to initiate the distance matrix.
Value
Returns a hclust object same as the output of hclust(). Reference: help(hclust)
merge | 
 an n-1 by 2 matrix. Row i of merge describes the merging of clusters at step i of the clustering. If an element j in the row is negative, then observation -j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm.  | 
height | 
 a set of n-1 real values, the value of the criterion associated with the clusterig method for the particular agglomeration  | 
order | 
 a vector giving the permutation of the original observations suitable for plotting, in the sense that a cluster plot using this ordering and matrix merge will not have crossings of the branches.  | 
labels | 
 labels for each of the objects being clustered  | 
call | 
 the call which produced the result  | 
dist.method | 
 the distance that has been used to create d  | 
height.0 | 
 original calculation of merging height  | 
Author(s)
Tianwei Yu <tianwei.yu@emory.edu>
References
http://www.ncbi.nlm.nih.gov/pubmed/24334400
See Also
Examples
 ## generating the data matrix & hiden clusters as a sample
 input<-data.gen(n.genes=40, n.grps=4)
 ## now input includes data matrix and hiden clusters, so get the matrix as input.
 input<-input$data
 nlhc.data<-nlhc(input)
 plot(nlhc.data)
 ##get the merge from the input.
 merge<-nlhc.data$merge 
Non-Linear Network reconstruction from expression matrix
Description
Non-Linear Network reconstruction method
Usage
nlnet(input, min.fdr.cutoff=0.05,max.fdr.cutoff=0.2, conn.proportion=0.007, 
gene.fdr.plot=FALSE, min.module.size=0, gene.community.method="multilevel", 
use.normal.approx=FALSE, normalization="standardize", plot.method="communitygraph")
Arguments
input | 
 the data matrix with no missing values.  | 
min.fdr.cutoff | 
 the minimun allowable value of the local false discovery cutoff in establishing links between genes.  | 
max.fdr.cutoff | 
 the maximun allowable value of the local false discovery cutoff in establishing links between genes.  | 
conn.proportion | 
 the target proportion of connections between all pairs of genes, if allowed by the fdr cutoff limits.  | 
gene.fdr.plot | 
 whether plot a figure with estimated densities, distribution functions, and (local) false discovery rates.  | 
min.module.size | 
 the min number of genes together as a module.  | 
gene.community.method | 
 the method for community detection.  | 
use.normal.approx | 
 whether to use the normal approximation for the null hypothesis.  | 
normalization | 
 the normalization method for the array.  | 
plot.method | 
 the method for graph and community ploting.  | 
Details
gene.community.method: It provides three kinds of community detection method: "mutilevel", "label.propagation" and "leading.eigenvector".
use.normal.approx: If TRUE, normal approximation is used for every feature, AND all covariances are assumed to be zero. If FALSE, generates permutation based null distribution - mean vector and a variance-covariance matrix.
normalization: There are three choices: "standardize" means removing the mean of each row and make the standard deviation one; "normal_score" means normal score transformation; "none" means do nothing. In that case we still assume some normalization has been done by the user such that each row has approximately mean 0 and sd 1.
plot.method: It provides three kinds of ploting method: "none" means ploting no graph, "communitygraph" means ploting community with graph, "graph" means ploting graph, "membership" means ploting membership of the community
Value
it returns a graph and the community membership of the graph.
algorithm | 
 The algorithm name for community detection  | 
graph | 
 An igraph object including edges : Numeric vector defining the edges, the first edge points from the first element to the second, the second edge from the third to the fourth, etc.  | 
community | 
 Numeric vector, one value for each vertex, the membership vector of the community structure.  | 
Author(s)
Haodong Liu <liuhaodong0828@gmail.com>
References
https://www.ncbi.nlm.nih.gov/pubmed/27380516
See Also
Examples
 
 ## generating the data matrix & hiden clusters as a sample
  input<-data.gen(n.genes=40, n.grps=4)
## now input includes data matrix and hiden clusters, so get the matrix as input.
input<-input$data 
##change the ploting method
 result<-nlnet(input,plot.method="graph")
  ## get the result and see it values
 graph<-result$graph ##a igraph object.
 comm<-result$community ##community of the graph
 
 ## use different community detection method
 #nlnet(input,gene.community.method="label.propagation")
 
 ## change the fdr pro to control connections of genes
 ## adjust the modularity size
 #nlnet(input,conn.proportion=0.005,min.module.size=10)
 
Nonlinear Variable Selection based on DCOL
Description
This is a nonlinear variable selection procedure for generalized additive models. It's based on DCOL, using forward stagewise selection. In addition, a cross-validation is conducted to tune the stopping alpha level and finalize the variable selection.
Usage
nvsd(X, y, fold = 10, step.size = 0.01, stop.alpha = 0.05, stop.var.count = 20, 
max.model.var.count = 10, roughening.method = "DCOL", do.plot = F, pred.method = "MARS")
Arguments
X | 
 The predictor matrix. Each row is a gene (predictor), each column is a sample. Notice the dimensionality is different than most other packages, where each column is a predictor. This is to conform to other functions in this package that handles gene expression type of data.  | 
y | 
 The numerical outcome vector.  | 
fold | 
 The fold of cross-validation.  | 
step.size | 
 The step size of the roughening process.  | 
stop.alpha | 
 The alpha level (significance of the current selected predictor) to stop the iterations.  | 
stop.var.count | 
 The maximum number of predictors to select in the forward stagewise selection. Once this number is reached, the iteration stops.  | 
max.model.var.count | 
 The maximum number of predictors to select. Notice this can be smaller than the stop.var.count. Stop.var.count can be set more liniently, and this parameter controls the final maximum model size.  | 
roughening.method | 
 The method for roughening. The choices are "DCOL" or "spline".  | 
do.plot | 
 Whether to plot the points change in each step.  | 
pred.method | 
 The prediction method for the cross validation variable selection. As forward stagewise procedure doesn't do prediction, a method has to be borrowed from existing packages. The choices include "MARS", "RF", and "SVM".  | 
Details
Please refer to the reference for details.
Value
A list object is returned. The components include the following.
selected.pred | 
 The selected predictors (row number).  | 
all.pred | 
 The selected predictors by the forward stagewise selection. The $selected.pred is a subset of this.  | 
Author(s)
Tianwei Yu<tianwei.yu@emory.edu>
References
https://arxiv.org/abs/1601.05285
See Also
stage.forward
Examples
X<-matrix(rnorm(2000),ncol=20)
y<-sin(X[,1])+X[,2]^2+X[,3]
nvsd(t(X),y,stop.alpha=0.001,step.size=0.05)
Nonlinear Forward stagewise regression using DCOL
Description
The subroutine conducts forward stagewise regression using DCOL. Either DCOL roughening or spline roughening is conducted.
Usage
stage.forward(X, y, step.size = 0.01, stop.alpha = 0.01, 
stop.var.count = 20, roughening.method = "DCOL", tol = 1e-08, 
spline.df = 5, dcol.sel.only = FALSE, do.plot = F)
Arguments
X | 
 The predictor matrix. Each row is a gene (predictor), each column is a sample. Notice the dimensionality is different than most other packages, where each column is a predictor. This is to conform to other functions in this package that handles gene expression type of data.  | 
y | 
 The numerical outcome vector.  | 
step.size | 
 The step size of the roughening process.  | 
stop.alpha | 
 The alpha level (significance of the current selected predictor) to stop the iterations.  | 
stop.var.count | 
 The maximum number of predictors to select. Once this number is reached, the iteration stops.  | 
roughening.method | 
 The method for roughening. The choices are "DCOL" or "spline".  | 
tol | 
 The tolerance level of sum of squared changes in the residuals.  | 
spline.df | 
 The degree of freedom for the spline.  | 
dcol.sel.only | 
 TRUE or FALSE. If FALSE, the selection of predictors will consider both linear and nonlinear association significance.  | 
do.plot | 
 Whether to plot the points change in each step.  | 
Details
Please refer to the reference manuscript for details.
Value
A list object is returned. The components include the following.
found.pred | 
 The selected predictors (row number).  | 
ssx.rec | 
 The magnitude of variance explained using the current predictor at each step.  | 
$sel.rec | 
 The selected predictor at each step.  | 
$p.rec | 
 The p-value of the association between the current residual and the selected predictor at each step.  | 
Author(s)
Tianwei Yu<tianwei.yu@emory.edu>
References
https://arxiv.org/abs/1601.05285
See Also
nvsd
Examples
X<-matrix(rnorm(2000),ncol=20)
y<-sin(X[,1])+X[,2]^2+X[,3]
stage.forward(t(X),y,stop.alpha=0.001,step.size=0.05)