The HOPACH clustering algorithm builds a hierarchical tree of clusters by recursively partitioning a data set, while ordering and possibly collapsing clusters at each level. The algorithm uses the Mean/Median Split Silhouette (MSS) criteria to identify the level of the tree with maximally homogeneous clusters. It also runs the tree down to produce a final ordered list of the elements. The non-parametric bootstrap allows one to estimate the probability that each element belongs to each cluster (fuzzy clustering).
Package nethet is an implementation of statistical solid methodology enabling the analysis of network heterogeneity from high-dimensional data. It combines several implementations of recent statistical innovations useful for estimation and comparison of networks in a heterogeneous, high-dimensional setting. In particular, we provide code for formal two-sample testing in Gaussian graphical models (differential network and GGM-GSA; Stadler and Mukherjee, 2013, 2014) and make a novel network-based clustering algorithm available (mixed graphical lasso, Stadler and Mukherjee, 2013).
This package provides a daemon for checking running and not running processes. It reads the /proc
directory every n seconds and does a POSIX regexp on the process names. The daemon runs a user-provided script when it detects a program in the running processes, or an alternate script if it doesn't detect the program. The daemon can only be called by the root user, but can use sudo -u user
in the process called if needed.
Oscope is a oscillatory genes identifier in unsynchronized single cell RNA-seq. This statistical pipeline has been developed to identify and recover the base cycle profiles of oscillating genes in an unsynchronized single cell RNA-seq experiment. The Oscope pipeline includes three modules: a sine model module to search for candidate oscillator pairs; a K-medoids clustering module to cluster candidate oscillators into groups; and an extended nearest insertion module to recover the base cycle order for each oscillator group.
This package provides an optimization method based on sequential quadratic programming for maximum likelihood estimation of the mixture proportions in a finite mixture model where the component densities are known. The algorithm is expected to obtain solutions that are at least as accurate as the state-of-the-art MOSEK interior-point solver, and they are expected to arrive at solutions more quickly when the number of samples is large and the number of mixture components is not too large.
Addressing a central challenge encountered in Mendelian randomization (MR) studies, where MR primarily focuses on discerning the effects of individual exposures on specific outcomes and establishes causal links between them. Using a network-based methodology, the intricacy involving interdependent outcomes due to numerous factors has been tackled through this routine. Based on Ni et al. (2018) <doi:10.1214/17-BA1087>, MR.RGM extends to a broader exploration of the causal landscape by leveraging on network structures and involves the construction of causal graphs that capture interactions between response variables and consequently between responses and instrument variables. The resulting Graph visually represents these causal connections, showing directed edges with effect sizes labeled. MR.RGM facilitates the navigation of various data availability scenarios effectively by accommodating three input formats, i.e., individual-level data and two types of summary-level data. In the process, causal effects, adjacency matrices, and other essential parameters of the complex biological networks, are estimated. Besides, MR.RGM provides uncertainty quantification for specific network structures among response variables.
Bisulfite-treated RNA non-conversion in a set of samples is analysed as follows : each sample's non-conversion distribution is identified to a Poisson distribution. P-values adjusted for multiple testing are calculated in each sample. Combined non-conversion P-values and standard errors are calculated on the intersection of the set of samples. For further details, see C Legrand, F Tuorto, M Hartmann, R Liebers, D Jakob, M Helm and F Lyko (2017) <doi:10.1101/gr.210666.116>.
The bootstrap ARDL tests for cointegration is the main functionality of this package. It also acts as a wrapper of the most commond ARDL testing procedures for cointegration: the bound tests of Pesaran, Shin and Smith (PSS; 2001 - <doi:10.1002/jae.616>) and the asymptotic test on the independent variables of Sam, McNown
and Goh (SMG: 2019 - <doi:10.1016/j.econmod.2018.11.001>). Bootstrap and bound tests are performed under both the conditional and unconditional ARDL models.
Computes solutions for linear and logistic regression models with potentially high-dimensional categorical predictors. This is done by applying a nonconvex penalty (SCOPE) and computing solutions in an efficient path-wise fashion. The scaling of the solution paths is selected automatically. Includes functionality for selecting tuning parameter lambda by k-fold cross-validation and early termination based on information criteria. Solutions are computed by cyclical block-coordinate descent, iterating an innovative dynamic programming algorithm to compute exact solutions for each block.
This package provides a variety of methods to identify data quality issues in process-oriented data, which are useful to verify data quality in a process mining context. Builds on the class for activity logs implemented in the package bupaR
'. Methods to identify data quality issues either consider each activity log entry independently (e.g. missing values, activity duration outliers,...), or focus on the relation amongst several activity log entries (e.g. batch registrations, violations of the expected activity order,...).
The purpose of this package is to tests whether a given moment of the distribution of a given sample is finite or not. For heavy-tailed distributions with tail exponent b, only moments of order smaller than b are finite. Tail exponent and heavy- tailedness are notoriously difficult to ascertain. But the finiteness of moments (including fractional moments) can be tested directly. This package does that following the test suggested by Trapani (2016) <doi:10.1016/j.jeconom.2015.08.006>.
Automatically performs desired statistical tests (e.g. wilcox.test()
, t.test()
) to compare between groups, and adds the resulting p-values to the plot with an annotation bar. Visualizing group differences are frequently performed by boxplots, bar plots, etc. Statistical test results are often needed to be annotated on these plots. This package provides a convenient function that works on ggplot2 objects, performs the desired statistical test between groups of interest and annotates the test results on the plot.
Recently, multiple marginal variable selection methods have been developed and shown to be effective in Gene-Environment interactions studies. We propose a novel marginal Bayesian variable selection method for Gene-Environment interactions studies. In particular, our marginal Bayesian method is robust to data contamination and outliers in the outcome variables. With the incorporation of spike-and-slab priors, we have implemented the Gibbs sampler based on Markov Chain Monte Carlo. The core algorithms of the package have been developed in C++'.
Addons for the mice package to perform multiple imputation using chained equations with two-level data. Includes imputation methods dedicated to sporadically and systematically missing values. Imputation of continuous, binary or count variables are available. Following the recommendations of Audigier, V. et al (2018) <doi:10.1214/18-STS646>, the choice of the imputation method for each variable can be facilitated by a default choice tuned according to the structure of the incomplete dataset. Allows parallel calculation and overimputation for mice'.
This package offers three important components: (1) to construct a use-defined linear mixed model, (2) to employ one of linear mixed model approaches: minimum norm quadratic unbiased estimation (MINQUE) (Rao, 1971) for variance component estimation and random effect prediction; and (3) to employ a jackknife resampling technique to conduct various statistical tests. In addition, this package provides the function for model or data evaluations.This R package offers fast computations for large data sets analyses for various irregular data structures.
Segregation is a network-level property such that edges between predefined groups of vertices are relatively less likely. Network homophily is a individual-level tendency to form relations with people who are similar on some attribute (e.g. gender, music taste, social status, etc.). In general homophily leads to segregation, but segregation might arise without homophily. This package implements descriptive indices measuring homophily/segregation. It is a computational companion to Bojanowski & Corten (2014) <doi:10.1016/j.socnet.2014.04.001>.
This package implements a unified framework of parametric simplex method for a variety of sparse learning problems (e.g., Dantzig selector (for linear regression), sparse quantile regression, sparse support vector machines, and compressive sensing) combined with efficient hyper-parameter selection strategies. The core algorithm is implemented in C++ with Eigen3 support for portable high performance linear algebra. For more details about parametric simplex method, see Haotian Pang (2017) <https://papers.nips.cc/paper/6623-parametric-simplex-method-for-sparse-learning.pdf>.
The plsdof package provides Degrees of Freedom estimates for Partial Least Squares (PLS) Regression. Model selection for PLS is based on various information criteria (aic, bic, gmdl) or on cross-validation. Estimates for the mean and covariance of the PLS regression coefficients are available. They allow the construction of approximate confidence intervals and the application of test procedures (Kramer and Sugiyama 2012 <doi:10.1198/jasa.2011.tm10107>). Further, cross-validation procedures for Ridge Regression and Principal Components Regression are available.
This package provides functions to infer co-mapping trait hotspots and causal models. Chaibub Neto E, Keller MP, Broman AF, Attie AD, Jansen RC, Broman KW, Yandell BS (2012) Quantile-based permutation thresholds for QTL hotspots. Genetics 191 : 1355-1365. <doi:10.1534/genetics.112.139451>. Chaibub Neto E, Broman AT, Keller MP, Attie AD, Zhang B, Zhu J, Yandell BS (2013) Modeling causality for pairs of phenotypes in system genetics. Genetics 193 : 1003-1013. <doi:10.1534/genetics.112.147124>.
This software provides tools for quantitative trait mapping in populations such as advanced intercross lines where relatedness among individuals should not be ignored. It can estimate background genetic variance components, impute missing genotypes, simulate genotypes, perform a genome scan for putative quantitative trait loci (QTL), and plot mapping results. It also has functions to calculate identity coefficients from pedigrees, especially suitable for pedigrees that consist of a large number of generations, or estimate identity coefficients from genotypic data in certain circumstances.
This package implements the following approaches for multidimensional scaling (MDS) based on stress minimization using majorization (smacof): ratio/interval/ordinal/spline MDS on symmetric dissimilarity matrices, MDS with external constraints on the configuration, individual differences scaling (idioscal, indscal), MDS with spherical restrictions, and ratio/interval/ordinal/spline unfolding (circular restrictions, row-conditional). Various tools and extensions like jackknife MDS, bootstrap MDS, permutation tests, MDS biplots, gravity models, unidimensional scaling, drift vectors (asymmetric MDS), classical scaling, and Procrustes are implemented as well.
Gene set analysis methods exist to combine SNP-level association p-values into gene sets, calculating a single association p-value for each gene set. This package implements two such methods that require only the calculated SNP p-values, the gene set(s) of interest, and a correlation matrix (if desired). One method (GLOSSI) requires independent SNPs and the other (VEGAS) can take into account correlation (LD) among the SNPs. Built-in plotting functions are available to help users visualize results.
nuCpos
, a derivative of NuPoP
, is an R package for prediction of nucleosome positions. nuCpos
calculates local and whole nucleosomal histone binding affinity (HBA) scores for a given 147-bp sequence. Note: This package was designed to demonstrate the use of chemical maps in prediction. As the parental package NuPoP
now provides chemical-map-based prediction, the function for dHMM-based
prediction was removed from this package. nuCpos
continues to provide functions for HBA calculation.
This package is for building isoscapes using mixed models and inferring the geographic origin of samples based on their isotopic ratios. This package is essentially a simplified interface to several other packages which implements a new statistical framework based on mixed models. It uses spaMM
for fitting and predicting isoscapes, and assigning an organism's origin depending on its isotopic ratio. IsoriX
also relies heavily on the package rasterVis
for plotting the maps produced with terra using lattice'.