Set of tools to find coherent patterns in gene expression (microarray) data using a Bayesian Sparse Latent Factor Model (SLFM) <DOI:10.1007/978-3-319-12454-4_15>. Considerable effort has been put to build a fast and memory efficient package, which makes this proposal an interesting and computationally convenient alternative to study patterns of gene expressions exhibited in matrices. The package contains the implementation of two versions of the model based on different mixture priors for the loadings: one relies on a degenerate component at zero and the other uses a small variance normal distribution for the spike part of the mixture.
This package implements the Vine Copula Change Point (VCCP) methodology for the estimation of the number and location of multiple change points in the vine copula structure of multivariate time series. The method uses vine copulas, various state-of-the-art segmentation methods to identify multiple change points, and a likelihood ratio test or the stationary bootstrap for inference. The vine copulas allow for various forms of dependence between time series including tail, symmetric and asymmetric dependence. The functions have been extensively tested on simulated multivariate time series data and fMRI data. For details on the VCCP methodology, please see Xiong & Cribben (2021).
The purpose of this package is to discover the genes that are differentially expressed between two conditions in RNA-seq experiments. Gene expression is measured in counts of transcripts and modeled with the Negative Binomial (NB) distribution using a shrinkage approach for dispersion estimation. The method of moment (MM) estimates for dispersion are shrunk towards an estimated target, which minimizes the average squared difference between the shrinkage estimates and the initial estimates. The exact per-gene probability under the NB model is calculated, and used to test the hypothesis that the expected expression of a gene in two conditions identically follow a NB distribution.
This package implements algorithms for calculating microarray enrichment (ACME), and it is a set of tools for analysing tiling array of combined chromatin immunoprecipitation with DNA microarray (ChIP/chip), DNAse hypersensitivity, or other experiments that result in regions of the genome showing enrichment. It does not rely on a specific array technology (although the array should be a tiling array), is very general (can be applied in experiments resulting in regions of enrichment), and is very insensitive to array noise or normalization methods. It is also very fast and can be applied on whole-genome tiling array experiments quite easily with enough memory.
Developed for the following tasks. 1- Simulating and computing the maximum likelihood estimator for the Birnbaum-Saunders (BS) distribution, 2- Computing the Bayesian estimator for the parameters of the BS distribution based on reference prior proposed by Xu and Tang (2010) <doi:10.1016/j.csda.2009.08.004> and conjugate prior. 3- Computing the Bayesian estimator for the BS distribution based on conjugate prior. 4- Computing the Bayesian estimator for the BS distribution based on Jeffrey prior given by Achcar (1993) <doi:10.1016/0167-9473(93)90170-X> 5- Computing the Bayesian estimator for the BS distribution under progressive type-II censoring scheme.
In order to achieve accurate estimation without sparsity assumption on the precision matrix, element-wise inference on the precision matrix, and joint estimation of multiple Gaussian graphical models, a novel method is proposed and efficient algorithm is implemented. FLAG() is the main function given a data matrix, and FlagOneEdge() will be used when one pair of random variables are interested where their indices should be given. Flexible and Accurate Methods for Estimation and Inference of Gaussian Graphical Models with Applications, see Qian Y (2023) <doi:10.14711/thesis-991013223054603412>, Qian Y, Hu X, Yang C (2023) <doi:10.48550/arXiv.2306.17584>.
This package provides a comprehensive Shiny-based graphical user interface for conducting a wide range of factor analysis procedures. FAfA (Factor Analysis for All) guides users through data uploading, assumption checking (descriptives, collinearity, multivariate normality, outliers), data wrangling (variable exclusion, data splitting), factor retention analysis (e.g., Parallel Analysis, Hull method, EGA), Exploratory Factor Analysis (EFA) with various rotation and extraction methods, Confirmatory Factor Analysis (CFA) for model testing, Reliability Analysis (e.g., Cronbach's Alpha, McDonald's Omega), Measurement Invariance testing across groups, and item weighting techniques. Results are presented in user-friendly tables and plots, with options for downloading outputs.
This package provides a collection of string functions designed for writing compact and expressive R code. yasp (Yet Another String Package) is simple, fast, dependency-free, and written in pure R. The package provides: a coherent set of abbreviations for paste() from package base with a variety of defaults, such as p() for "paste" and pcc() for "paste and collapse with commas"; wrap(), bracket(), and others for wrapping a string in flanking characters; unwrap() for removing pairs of characters (at any position in a string); and sentence() for cleaning whitespace around punctuation and capitalization appropriate for prose sentences.
This package implements approximation methods for natural capital asset prices suggested by Fenichel and Abbott (2014) <doi:10.1086/676034> in Journal of the Associations of Environmental and Resource Economists (JAERE), Fenichel et al. (2016) <doi:10.1073/pnas.1513779113> in Proceedings of the National Academy of Sciences (PNAS), and Yun et al. (2017) in PNAS (accepted), and their extensions: creating Chebyshev polynomial nodes and grids, calculating basis of Chebyshev polynomials, approximation and their simulations for: V-approximation (single and multiple stocks, PNAS), P-approximation (single stock, PNAS), and Pdot-approximation (single stock, JAERE). Development of this package was generously supported by the Knobloch Family Foundation.
Statistical methods and related graphical representations for the Desirability of Outcome Ranking (DOOR) methodology. The DOOR is a paradigm for the design, analysis, interpretation of clinical trials and other research studies based on the patient centric benefit risk evaluation. The package provides functions for generating summary statistics from individual level/summary level datasets, conduct DOOR probability-based inference, and visualization of the results. For more details of DOOR methodology, see Hamasaki and Evans (2025) <doi:10.1201/9781003390855>. For more explanation of the statistical methods and the graphics, see the technical document and user manual of the DOOR Shiny apps at <https://methods.bsc.gwu.edu>.
This package provides statistical methods for estimating bivariate dependency (correlation) from marginal summary statistics across multiple studies. The package supports three modules: (1) bivariate correlation estimation for binary outcomes, (2) bivariate correlation estimation for continuous outcomes, and (3) estimation of component-wise means and variances under a conditional two-component Gaussian mixture model for a continuous variable stratified by a binary class label. These methods enable privacy-preserving joint estimation when individual-level data are unavailable. The approaches are detailed in Shang, Tsao, and Zhang (2025a) <doi:10.48550/arXiv.2505.03995> and Shang, Tsao, and Zhang (2025b) <doi:10.48550/arXiv.2508.02057>.
Computes characteristics of independent rainfall events (duration, total rainfall depth, and intensity) extracted from a sub-daily rainfall time series based on the inter-event time definition (IETD) method. To have a reference value of IETD, it also analyzes/computes IETD values through three methods: autocorrelation analysis, the average annual number of events analysis, and coefficient of variation analysis. Ideal for analyzing the sensitivity of IETD to characteristics of independent rainfall events. Adams B, Papa F (2000) <ISBN: 978-0-471-33217-6>. Joo J et al. (2014) <doi:10.3390/w6010045>. Restrepo-Posada P, Eagleson P (1982) <doi:10.1016/0022-1694(82)90136-6>.
An efficient sensitivity analysis for stochastic models based on Monte Carlo samples. Provides weights on simulated scenarios from a stochastic model, such that stressed random variables fulfil given probabilistic constraints (e.g. specified values for risk measures), under the new scenario weights. Scenario weights are selected by constrained minimisation of the relative entropy to the baseline model. The SWIM package is based on Pesenti S.M., Millossovich P., Tsanakas A. (2019) "Reverse Sensitivity Testing: What does it take to break the model" <openaccess.city.ac.uk/id/eprint/18896/> and Pesenti S.M. (2021) "Reverse Sensitivity Analysis for Risk Modelling" <https://www.ssrn.com/abstract=3878879>.
This package provides a general framework of two directional simultaneous inference is provided for high-dimensional as well as the fixed dimensional models with manifest variable or latent variable structure, such as high-dimensional mean models, high- dimensional sparse regression models, and high-dimensional latent factors models. It is making the simultaneous inference on a set of parameters from two directions, one is testing whether the estimated zero parameters indeed are zero and the other is testing whether there exists zero in the parameter set of non-zero. More details can be referred to Wei Liu, et al. (2022) <doi:10.48550/arXiv.2012.11100>.
Dino normalizes single-cell, mRNA sequencing data to correct for technical variation, particularly sequencing depth, prior to downstream analysis. The approach produces a matrix of corrected expression for which the dependency between sequencing depth and the full distribution of normalized expression; many existing methods aim to remove only the dependency between sequencing depth and the mean of the normalized expression. This is particuarly useful in the context of highly sparse datasets such as those produced by 10X genomics and other uninque molecular identifier (UMI) based microfluidics protocols for which the depth-dependent proportion of zeros in the raw expression data can otherwise present a challenge.
This package provides methods for manipulating regression models and for describing these in a style adapted for medical journals. It contains functions for generating an HTML table with crude and adjusted estimates, plotting hazard ratio, plotting model estimates and confidence intervals using forest plots, extending this to comparing multiple models in a single forest plots. In addition to the descriptive methods, there are functions for the robust covariance matrix provided by the sandwich package, a function for adding non-linearities to a model, and a wrapper around the Epi package's Lexis() functions for time-splitting a dataset when modeling non-proportional hazards in Cox regressions.
Chinese numerals processing in R, such as conversion between Chinese numerals and Arabic numerals as well as detection and extraction of Chinese numerals in character objects and string. This package supports the casual scale naming system and the respective SI prefix systems used in mainland China and Taiwan: "The State Council's Order on the Unified Implementation of Legal Measurement Units in Our Country" The State Council of the People's Republic of China (1984) "Names, Definitions and Symbols of the Legal Units of Measurement and the Decimal Multiples and Submultiples" Ministry of Economic Affairs (2019) <https://gazette.nat.gov.tw/egFront/detail.do?metaid=108965>.
Bayesian factor models are effective tools for dimension reduction. This is especially applicable to multivariate large-scale datasets. It allows researchers to understand the latent factors of the data which are the linear or non-linear combination of the variables. Dynamic Intrinsic Conditional Autocorrelative Priors (ICAR) Spatiotemporal Factor Models DIFM package provides function to run Markov Chain Monte Carlo (MCMC), evaluation methods and visual plots from Shin and Ferreira (2023)<doi:10.1016/j.spasta.2023.100763>. Our method is a class of Bayesian factor model which can account for spatial and temporal correlations. By incorporating these correlations, the model can capture specific behaviors and provide predictions.
Kernel Learning Integrative Clustering (KLIC) is an algorithm that allows to combine multiple kernels, each representing a different measure of the similarity between a set of observations. The contribution of each kernel on the final clustering is weighted according to the amount of information carried by it. As well as providing the functions required to perform the kernel-based clustering, this package also allows the user to simply give the data as input: the kernels are then built using consensus clustering. Different strategies to choose the best number of clusters are also available. For further details please see Cabassi and Kirk (2020) <doi:10.1093/bioinformatics/btaa593>.
Calculation routines based on the FOCUS Kinetics Report (2006, 2014). Includes a function for conveniently defining differential equation models, model solution based on eigenvalues if possible or using numerical solvers. If a C compiler (on windows: Rtools') is installed, differential equation models are solved using automatically generated C functions. Non-constant errors can be taken into account using variance by variable or two-component error models <doi:10.3390/environments6120124>. Hierarchical degradation models can be fitted using nonlinear mixed-effects model packages as a back end <doi:10.3390/environments8080071>. Please note that no warranty is implied for correctness of results or fitness for a particular purpose.
This package implements multivariate Fay-Herriot models for small area estimation. It uses empirical best linear unbiased prediction (EBLUP) estimator. Multivariate models consider the correlation of several target variables and borrow strength from auxiliary variables to improve the effectiveness of a domain sample size. Models which accommodated by this package are univariate model with several target variables (model 0), multivariate model (model 1), autoregressive multivariate model (model 2), and heteroscedastic autoregressive multivariate model (model 3). Functions provide EBLUP estimators and mean squared error (MSE) estimator for each model. These models were developed by Roberto Benavent and Domingo Morales (2015) <doi:10.1016/j.csda.2015.07.013>.
Support for a variety of commonly used precision agriculture operations. Includes functions to download and process raw satellite images from Sentinel-2 <https://documentation.dataspace.copernicus.eu/APIs/OData.html>. Includes functions that download vegetation index statistics for a given period of time, without the need to download the raw images <https://documentation.dataspace.copernicus.eu/APIs/SentinelHub/Statistical.html>. There are also functions to download and visualize weather data in a historical context. Lastly, the package also contains functions to process yield monitor data. These functions can build polygons around recorded data points, evaluate the overlap between polygons, clean yield data, and smooth yield maps.
The package obtains parameter estimation, i.e., maximum likelihood estimators (MLE), via the Expectation-Maximization (EM) algorithm for the Finite Mixture of Regression (FMR) models with Normal distribution, and MLE for the Finite Mixture of Accelerated Failure Time Regression (FMAFTR) subject to right censoring with Log-Normal and Weibull distributions via the EM algorithm and the Newton-Raphson algorithm (for Weibull distribution). More importantly, the package obtains the maximum penalized likelihood (MPLE) for both FMR and FMAFTR models (collectively called FMRs). A component-wise tuning parameter selection based on a component-wise BIC is implemented in the package. Furthermore, this package provides Ridge Regression and Elastic Net.
Calculates nonparametric pointwise confidence intervals for the survival distribution for right censored data, and for medians [Fay and Brittain <DOI:10.1002/sim.6905>]. Has two-sample tests for dissimilarity (e.g., difference, ratio or odds ratio) in survival at a fixed time, and differences in medians [Fay, Proschan, and Brittain <DOI:10.1111/biom.12231>]. Basically, the package gives exact inference methods for one- and two-sample exact inferences for Kaplan-Meier curves (e.g., generalizing Fisher's exact test to allow for right censoring), which are especially important for latter parts of the survival curve, small sample sizes or heavily censored data. Includes mid-p options.