Implementations of several methods for principal component analysis using the L1 norm. The package depends on COIN-OR Clp version >= 1.17.4. The methods implemented are PCA-L1 (Kwak 2008) <DOI:10.1109/TPAMI.2008.114>, L1-PCA (Ke and Kanade 2003, 2005) <DOI:10.1109/CVPR.2005.309>, L1-PCA* (Brooks, Dula, and Boone 2013) <DOI:10.1016/j.csda.2012.11.007>, L1-PCAhp (Visentin, Prestwich and Armagan 2016) <DOI:10.1007/978-3-319-46227-1_37>, wPCA (Park and Klabjan 2016) <DOI: 10.1109/ICDM.2016.0054>, awPCA (Park and Klabjan 2016) <DOI: 10.1109/ICDM.2016.0054>, PCA-Lp (Kwak 2014) <DOI:10.1109/TCYB.2013.2262936>, and SharpEl1-PCA (Brooks and Dula, submitted).
The developed package can be used to generate a spatial population for different levels of relationships among the dependent and auxiliary variables along with spatially varying model parameters. A spatial layout is designed as a [0,k-1]x[0,k-1] square region on which observations are collected at (k x k) lattice points with a unit distance between any two neighbouring points along the horizontal and vertical axes. For method details see Chao, Liu., Chuanhua, Wei. and Yunan, Su. (2018).<doi:10.1080/10485252.2018.1499907>. The generated spatial population can be utilized in Geographically Weighted Regression model based analysis for studying the spatially varying relationships among the variables. Furthermore, various statistical analysis can be performed on this spatially generated data.
Implementation of functions for fitting taper curves (a semiparametric linear mixed effects taper model) to diameter measurements along stems. Further functions are provided to estimate the uncertainty around the predicted curves, to calculate timber volume (also by sections) and marginal (e.g., upper) diameters. For cases where tree heights are not measured, methods for estimating additional variance in volume predictions resulting from uncertainties in tree height models (tariffs) are provided. The example data include the taper curve parameters for Norway spruce used in the 3rd German NFI fitted to 380 trees and a subset of section-wise diameter measurements of these trees. The functions implemented here are detailed in Kublin, E., Breidenbach, J., Kaendler, G. (2013) <doi:10.1007/s10342-013-0715-0>.
Genome wide studies of translational control is emerging as a tool to study various biological conditions. The output from such analysis is both the mRNA level (e.g. cytosolic mRNA level) and the level of mRNA actively involved in translation (the actively translating mRNA level) for each mRNA. The standard analysis of such data strives towards identifying differential translational between two or more sample classes - i.e., differences in actively translated mRNA levels that are independent of underlying differences in cytosolic mRNA levels. This package allows for such analysis using partial variances and the random variance model. As 10s of thousands of mRNAs are analyzed in parallel the library performs a number of tests to assure that the data set is suitable for such analysis.
This package provides a set of psychometric tools for cognitive diagnosis modeling based on the generalized deterministic inputs, noisy and gate (G-DINA) model by de la Torre (2011) doi:10.1007/s11336-011-9207-7 and its extensions, including the sequential G-DINA model by Ma and de la Torre (2016) doi:10.1111/bmsp.12070 for polytomous responses, and the polytomous G-DINA model by Chen and de la Torre doi:10.1177/0146621613479818 for polytomous attributes. Joint attribute distribution can be independent, saturated, higher-order, loglinear smoothed or structured. Q-matrix validation, item and model fit statistics, model comparison at test and item level and differential item functioning can also be conducted. A graphical user interface is also provided.
This package provides a computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed tree-structured regression models into a well defined theory of conditional inference procedures. This non-parametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman's random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing tree-structured regression models is available.
dinoR tests for significant differences in NOMe-seq footprints between two conditions, using genomic regions of interest (ROI) centered around a landmark, for example a transcription factor (TF) motif. This package takes NOMe-seq data (GCH methylation/protection) in the form of a Ranged Summarized Experiment as input. dinoR can be used to group sequencing fragments into 3 or 5 categories representing characteristic footprints (TF bound, nculeosome bound, open chromatin), plot the percentage of fragments in each category in a heatmap, or averaged across different ROI groups, for example, containing a common TF motif. It is designed to compare footprints between two sample groups, using edgeR's quasi-likelihood methods on the total fragment counts per ROI, sample, and footprint category.
Calculates concentration and dispersion in ordered rating scales. It implements various measures of concentration and dispersion to describe what researchers variably call agreement, concentration, consensus, dispersion, or polarization among respondents in ordered data. It also implements other related measures to classify distributions. In addition to a generic city-block based concentration measure and a generic dispersion measure, the package implements various measures, including van der Eijk's (2001) <DOI: 10.1023/A:1010374114305> measure of agreement A, measures of concentration by Leik, Tatsle and Wierman, Blair and Lacy, Kvalseth, Berry and Mielke, Reardon, and Garcia-Montalvo and Reynal-Querol. Furthermore, the package provides an implementation of Galtungs AJUS-system to classify distributions, as well as a function to identify the position of multiple modes.
This package provides a genome-wide survival framework that integrates sequential conditional independent tuples and saddlepoint approximation method, to provide SNP-level false discovery rate control while improving power, particularly for biobank-scale survival analyses with low event rates. The method is based on model-X knockoffs as described in Barber and Candes (2015) <doi:10.1214/15-AOS1337> and fast survival analysis methods from Bi et al. (2020) <doi:10.1016/j.ajhg.2020.06.003>. A shrinkage algorithmic leveraging accelerates multiple knockoffs generation in large genetic cohorts. This CRAN version uses standard Cox regression for association testing. For enhanced performance on very large datasets, users may optionally install the SPACox package from GitHub which provides saddlepoint approximation methods for survival analysis.
This package provides a set of control charts for batch processes based on the VAR model. The package contains the implementation of T2.var and W.var control charts based on VAR model coefficients using the couple vectors theory. In each time-instant the VAR coefficients are estimated from a historical in-control dataset and a decision rule is made for online classifying of a new batch data. Those charts allow efficient online monitoring since the very first time-instant. The offline version is available too. In order to evaluate the chart's performance, this package contains functions to generate batch data for offline and online monitoring.See in Danilo Marcondes Filho and Marcio Valk (2020) <doi:10.1016/j.ejor.2019.12.038>.
Approximate marginal maximum likelihood estimation of multidimensional latent variable models via adaptive quadrature or Laplace approximations to the integrals in the likelihood function, as presented for confirmatory factor analysis models in Jin, S., Noh, M., and Lee, Y. (2018) <doi:10.1080/10705511.2017.1403287>, for item response theory models in Andersson, B., and Xin, T. (2021) <doi:10.3102/1076998620945199>, and for generalized linear latent variable models in Andersson, B., Jin, S., and Zhang, M. (2023) <doi:10.1016/j.csda.2023.107710>. Models implemented include the generalized partial credit model, the graded response model, and generalized linear latent variable models for Poisson, negative-binomial and normal distributions. Supports a combination of binary, ordinal, count and continuous observed variables and multiple group models.
This package implements empirical Bayes approaches to genotype polyploids from next generation sequencing data while accounting for allele bias, overdispersion, and sequencing error. The main functions are flexdog() and multidog(), which allow the specification of many different genotype distributions. Also provided are functions to simulate genotypes, rgeno(), and read-counts, rflexdog(), as well as functions to calculate oracle genotyping error rates, oracle_mis(), and correlation with the true genotypes, oracle_cor(). These latter two functions are useful for read depth calculations. Run browseVignettes(package = "updog") in R for example usage. See Gerard et al. (2018) <doi:10.1534/genetics.118.301468> and Gerard and Ferrao (2020) <doi:10.1093/bioinformatics/btz852> for details on the implemented methods.
The implement of integrative analysis methods based on a two-part penalization, which realizes dimension reduction analysis and mining the heterogeneity and association of multiple studies with compatible designs. The software package provides the integrative analysis methods including integrative sparse principal component analysis (Fang et al., 2018), integrative sparse partial least squares (Liang et al., 2021) and integrative sparse canonical correlation analysis, as well as corresponding individual analysis and meta-analysis versions. References: (1) Fang, K., Fan, X., Zhang, Q., and Ma, S. (2018). Integrative sparse principal component analysis. Journal of Multivariate Analysis, <doi:10.1016/j.jmva.2018.02.002>. (2) Liang, W., Ma, S., Zhang, Q., and Zhu, T. (2021). Integrative sparse partial least squares. Statistics in Medicine, <doi:10.1002/sim.8900>.
Bayesian supervised predictive classifiers, hypothesis testing, and parametric estimation under Partition Exchangeability are implemented. The two classifiers presented are the marginal classifier (that assumes test data is i.i.d.) next to a more computationally costly but accurate simultaneous classifier (that finds a labelling for the entire test dataset at once based on simultanous use of all the test data to predict each label). We also provide the Maximum Likelihood Estimation (MLE) of the only underlying parameter of the partition exchangeability generative model as well as hypothesis testing statistics for equality of this parameter with a single value, alternative, or multiple samples. We present functions to simulate the sequences from Ewens Sampling Formula as the realisation of the Poisson-Dirichlet distribution and their respective probabilities.
This package provides a sensitivity analysis approach for unmeasured confounding in observational data with multiple treatments and a binary outcome. This approach derives the general bias formula and provides adjusted causal effect estimates in response to various assumptions about the degree of unmeasured confounding. Nested multiple imputation is embedded within the Bayesian framework to integrate uncertainty about the sensitivity parameters and sampling variability. Bayesian Additive Regression Model (BART) is used for outcome modeling. The causal estimands are the conditional average treatment effects (CATE) based on the risk difference. For more details, see paper: Hu L et al. (2020) A flexible sensitivity analysis approach for unmeasured confounding with multiple treatments and a binary outcome with application to SEER-Medicare lung cancer data <arXiv:2012.06093>.
This package performs Diffusion Non-Additive (DNA) model proposed by Heo, Boutelet, and Sung (2025+) <doi:10.48550/arXiv.2506.08328> for multi-fidelity computer experiments with tuning parameters. The DNA model captures nonlinear dependencies across fidelity levels using Gaussian process priors and is particularly effective when simulations at different fidelity levels are nonlinearly correlated. The DNA model targets not only interpolation across given fidelity levels but also extrapolation to smaller tuning parameters including the exact solution corresponding to a zero-valued tuning parameter, leveraging a nonseparable covariance kernel structure that models interactions between the tuning parameter and input variables. Closed-form expressions for the predictive mean and variance enable efficient inference and uncertainty quantification. Hyperparameters in the model are estimated via maximum likelihood estimation.
It provides a method based on EM algorithm to estimate the parameter of a mixture model, Sigmoid-Normal Model, where the samples come from several normal distributions (also call them subgroups) whose mean is determined by co-variable Z and coefficient alpha while the variance are homogeneous. Meanwhile, the subgroup each item belongs to is determined by co-variables X and coefficient eta through Sigmoid link function which is the extension of Logistic Link function. It uses bootstrap to estimate the standard error of parameters. When sample is indeed separable, removing estimation with abnormal sigma, the estimation of alpha is quite well. I used this method to explore the subgroup structure of HIV patients and it can be used in other domains where exists subgroup structure.
Maximum likelihood estimation, random values generation, density computation and other functions for the exponential-Poisson generalised exponential-Poisson and Poisson-exponential distributions. References include: Rodrigues G. C., Louzada F. and Ramos P. L. (2018). "Poisson-exponential distribution: different methods of estimation". Journal of Applied Statistics, 45(1): 128--144. <doi:10.1080/02664763.2016.1268571>. Louzada F., Ramos, P. L. and Ferreira, H. P. (2020). "Exponential-Poisson distribution: estimation and applications to rainfall and aircraft data with zero occurrence". Communications in Statistics--Simulation and Computation, 49(4): 1024--1043. <doi:10.1080/03610918.2018.1491988>. Barreto-Souza W. and Cribari-Neto F. (2009). "A generalization of the exponential-Poisson distribution". Statistics and Probability Letters, 79(24): 2493--2500. <doi:10.1016/j.spl.2009.09.003>.
NanoString nCounter data are gene expression assays where there is no need for the use of enzymes or amplification protocols and work with fluorescent barcodes (Geiss et al. (2018) <doi:10.1038/nbt1385>). Each barcode is assigned a messenger-RNA/micro-RNA (mRNA/miRNA) which after bonding with its target can be counted. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA. NACHO (NAnoString quality Control dasHbOard) is able to analyse the exported NanoString nCounter data and facilitates the user in performing a quality control. NACHO does this by visualising quality control metrics, expression of control genes, principal components and sample specific size factors in an interactive web application.
Makes it easy to build panel data in wide format from Panel Survey of Income Dynamics (PSID) delivered raw data. Downloads data directly from the PSID server using the SAScii package. psidR takes care of merging data from each wave onto a cross-period index file, so that individuals can be followed over time. The user must specify which years they are interested in, and the PSID variable names (e.g. ER21003) for each year (they differ in each year). The package offers helper functions to retrieve variable names from different waves. There are different panel data designs and sample subsetting criteria implemented ("SRC", "SEO", "immigrant" and "latino" samples). More information about the PSID can be obtained at <https://simba.isr.umich.edu/data/data.aspx>.
There are three main goals to the vctrs package:
To propose
vec_size()andvec_type()as alternatives tolength()andclass(). These definitions are paired with a framework for type-coercion and size-recycling.To define type- and size-stability as desirable function properties, use them to analyse existing base function, and to propose better alternatives. This work has been particularly motivated by thinking about the ideal properties of
c(),ifelse(), andrbind().To provide a new
vctrbase class that makes it easy to create new S3 vectors.vctrsprovides methods for many base generics in terms of a few newvctrsgenerics, making implementation considerably simpler and more robust.
TENET identifies key transcription factors (TFs) and regulatory elements (REs) linked to a specific cell type by finding significantly correlated differences in gene expression and RE DNA methylation between case and control input datasets, and identifying the top genes by number of significant RE DNA methylation site links. It also includes many tools for visualization and analysis of the results, including plots displaying and comparing methylation and expression data and methylation site link counts, survival analysis, TF motif searching in the vicinity of linked RE DNA methylation sites, custom TAD and peak overlap analysis, and UCSC Genome Browser track file generation. A utility function is also provided to download methylation, expression, and patient survival data from The Cancer Genome Atlas (TCGA) for use in TENET or other analyses.
This package provides a non-parametric framework based on estimation statistics principle. Its main purpose is to infer orders of empirical distributions from different categories based on a probability of finding a value in one distribution that is greater than an expectation of another distribution. Given a set of ordered-pair of real-category values the framework is capable of 1) inferring orders of domination of categories and representing orders in the form of a graph; 2) estimating magnitude of difference between a pair of categories in forms of mean-difference confidence intervals; and 3) visualizing domination orders and magnitudes of difference of categories. The publication of this package is at Chainarong Amornbunchornvej, Navaporn Surasvadi, Anon Plangprasopchok, and Suttipong Thajchayapong (2020) <doi:10.1016/j.heliyon.2020.e05435>.
Runs the eDITH (environmental DNA Integrating Transport and Hydrology) model, which implements a mass balance of environmental DNA (eDNA) transport at a river network scale coupled with a species distribution model to obtain maps of species distribution. eDITH can work with both eDNA concentration (e.g., obtained via quantitative polymerase chain reaction) or metabarcoding (read count) data. Parameter estimation can be performed via Bayesian techniques (via the BayesianTools package) or optimization algorithms. An interface to the DHARMa package for posterior predictive checks is provided. See Carraro and Altermatt (2024) <doi:10.1111/2041-210X.14317> for a package introduction; Carraro et al. (2018) <doi:10.1073/pnas.1813843115> and Carraro et al. (2020) <doi:10.1038/s41467-020-17337-8> for methodological details.