Approximate marginal maximum likelihood estimation of multidimensional latent variable models via adaptive quadrature or Laplace approximations to the integrals in the likelihood function, as presented for confirmatory factor analysis models in Jin, S., Noh, M., and Lee, Y. (2018) <doi:10.1080/10705511.2017.1403287>, for item response theory models in Andersson, B., and Xin, T. (2021) <doi:10.3102/1076998620945199>, and for generalized linear latent variable models in Andersson, B., Jin, S., and Zhang, M. (2023) <doi:10.1016/j.csda.2023.107710>. Models implemented include the generalized partial credit model, the graded response model, and generalized linear latent variable models for Poisson, negative-binomial and normal distributions. Supports a combination of binary, ordinal, count and continuous observed variables and multiple group models.
Developed a data-driven estimation framework for the multi-threshold accelerate failure time (MTAFT) model. The MTAFT model features different linear forms in different subdomains, and one of the major challenges is determining the number of threshold effects. The package introduces a data-driven approach that utilizes a Schwarz information criterion, which demonstrates consistency under mild conditions. Additionally, a cross-validation (CV) criterion with an order-preserved sample-splitting scheme is proposed to achieve consistent estimation, without the need for additional parameters. The package establishes the asymptotic properties of the parameter estimates and includes an efficient score-type test to examine the existence of threshold effects. The methodologies are supported by numerical experiments and theoretical results, showcasing their reliable performance in finite-sample cases.
Annotates single-cell and spatial-transcriptomic (ST) data using marker datasets. Supports unified markers list ('Markers_list') creation from built-in databases (e.g., Cellmarker2', PanglaoDB', scIBD', TCellSI'), Seurat objects, or user-supplied Excel files. SlimR can predict calculate parameters by machine learning algorithms (e.g., Random Forest', Gradient Boosting', Support Vector Machine', Ensemble Learning'), and based on Markers_list, calculate gene expression of different cell types and predict annotation information and calculate corresponding AUC and annotate it, then verify it. At the same time, it can calculate gene expression corresponding to the cell type to generate a reference map for manual annotation (e.g., Heat Map', Feature Plots', Combined Plots'). For more details see Kabacoff (2020, ISBN:9787115420572).
This package implements empirical Bayes approaches to genotype polyploids from next generation sequencing data while accounting for allele bias, overdispersion, and sequencing error. The main functions are flexdog() and multidog(), which allow the specification of many different genotype distributions. Also provided are functions to simulate genotypes, rgeno(), and read-counts, rflexdog(), as well as functions to calculate oracle genotyping error rates, oracle_mis(), and correlation with the true genotypes, oracle_cor(). These latter two functions are useful for read depth calculations. Run browseVignettes(package = "updog") in R for example usage. See Gerard et al. (2018) <doi:10.1534/genetics.118.301468> and Gerard and Ferrao (2020) <doi:10.1093/bioinformatics/btz852> for details on the implemented methods.
dinoR tests for significant differences in NOMe-seq footprints between two conditions, using genomic regions of interest (ROI) centered around a landmark, for example a transcription factor (TF) motif. This package takes NOMe-seq data (GCH methylation/protection) in the form of a Ranged Summarized Experiment as input. dinoR can be used to group sequencing fragments into 3 or 5 categories representing characteristic footprints (TF bound, nculeosome bound, open chromatin), plot the percentage of fragments in each category in a heatmap, or averaged across different ROI groups, for example, containing a common TF motif. It is designed to compare footprints between two sample groups, using edgeR's quasi-likelihood methods on the total fragment counts per ROI, sample, and footprint category.
Genome wide studies of translational control is emerging as a tool to study various biological conditions. The output from such analysis is both the mRNA level (e.g. cytosolic mRNA level) and the level of mRNA actively involved in translation (the actively translating mRNA level) for each mRNA. The standard analysis of such data strives towards identifying differential translational between two or more sample classes - i.e., differences in actively translated mRNA levels that are independent of underlying differences in cytosolic mRNA levels. This package allows for such analysis using partial variances and the random variance model. As 10s of thousands of mRNAs are analyzed in parallel the library performs a number of tests to assure that the data set is suitable for such analysis.
This package provides a set of psychometric tools for cognitive diagnosis modeling based on the generalized deterministic inputs, noisy and gate (G-DINA) model by de la Torre (2011) doi:10.1007/s11336-011-9207-7 and its extensions, including the sequential G-DINA model by Ma and de la Torre (2016) doi:10.1111/bmsp.12070 for polytomous responses, and the polytomous G-DINA model by Chen and de la Torre doi:10.1177/0146621613479818 for polytomous attributes. Joint attribute distribution can be independent, saturated, higher-order, loglinear smoothed or structured. Q-matrix validation, item and model fit statistics, model comparison at test and item level and differential item functioning can also be conducted. A graphical user interface is also provided.
This package provides a computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed tree-structured regression models into a well defined theory of conditional inference procedures. This non-parametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman's random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing tree-structured regression models is available.
The implement of integrative analysis methods based on a two-part penalization, which realizes dimension reduction analysis and mining the heterogeneity and association of multiple studies with compatible designs. The software package provides the integrative analysis methods including integrative sparse principal component analysis (Fang et al., 2018), integrative sparse partial least squares (Liang et al., 2021) and integrative sparse canonical correlation analysis, as well as corresponding individual analysis and meta-analysis versions. References: (1) Fang, K., Fan, X., Zhang, Q., and Ma, S. (2018). Integrative sparse principal component analysis. Journal of Multivariate Analysis, <doi:10.1016/j.jmva.2018.02.002>. (2) Liang, W., Ma, S., Zhang, Q., and Zhu, T. (2021). Integrative sparse partial least squares. Statistics in Medicine, <doi:10.1002/sim.8900>.
This is a method (MinED) for mining probability distributions using deterministic sampling which is proposed by Joseph, Wang, Gu, Lv, and Tuo (2019) <DOI:10.1080/00401706.2018.1552203>. The MinED samples can be used for approximating the target distribution. They can be generated from a density function that is known only up to a proportionality constant and thus, it might find applications in Bayesian computation. Moreover, the MinED samples are generated with much fewer evaluations of the density function compared to random sampling-based methods such as MCMC and therefore, this method will be especially useful when the unnormalized posterior is expensive or time consuming to evaluate. This research is supported by a U.S. National Science Foundation grant DMS-1712642.
Bayesian supervised predictive classifiers, hypothesis testing, and parametric estimation under Partition Exchangeability are implemented. The two classifiers presented are the marginal classifier (that assumes test data is i.i.d.) next to a more computationally costly but accurate simultaneous classifier (that finds a labelling for the entire test dataset at once based on simultanous use of all the test data to predict each label). We also provide the Maximum Likelihood Estimation (MLE) of the only underlying parameter of the partition exchangeability generative model as well as hypothesis testing statistics for equality of this parameter with a single value, alternative, or multiple samples. We present functions to simulate the sequences from Ewens Sampling Formula as the realisation of the Poisson-Dirichlet distribution and their respective probabilities.
This package provides a sensitivity analysis approach for unmeasured confounding in observational data with multiple treatments and a binary outcome. This approach derives the general bias formula and provides adjusted causal effect estimates in response to various assumptions about the degree of unmeasured confounding. Nested multiple imputation is embedded within the Bayesian framework to integrate uncertainty about the sensitivity parameters and sampling variability. Bayesian Additive Regression Model (BART) is used for outcome modeling. The causal estimands are the conditional average treatment effects (CATE) based on the risk difference. For more details, see paper: Hu L et al. (2020) A flexible sensitivity analysis approach for unmeasured confounding with multiple treatments and a binary outcome with application to SEER-Medicare lung cancer data <arXiv:2012.06093>.
This package performs Diffusion Non-Additive (DNA) model proposed by Heo, Boutelet, and Sung (2025+) <doi:10.48550/arXiv.2506.08328> for multi-fidelity computer experiments with tuning parameters. The DNA model captures nonlinear dependencies across fidelity levels using Gaussian process priors and is particularly effective when simulations at different fidelity levels are nonlinearly correlated. The DNA model targets not only interpolation across given fidelity levels but also extrapolation to smaller tuning parameters including the exact solution corresponding to a zero-valued tuning parameter, leveraging a nonseparable covariance kernel structure that models interactions between the tuning parameter and input variables. Closed-form expressions for the predictive mean and variance enable efficient inference and uncertainty quantification. Hyperparameters in the model are estimated via maximum likelihood estimation.
It provides a method based on EM algorithm to estimate the parameter of a mixture model, Sigmoid-Normal Model, where the samples come from several normal distributions (also call them subgroups) whose mean is determined by co-variable Z and coefficient alpha while the variance are homogeneous. Meanwhile, the subgroup each item belongs to is determined by co-variables X and coefficient eta through Sigmoid link function which is the extension of Logistic Link function. It uses bootstrap to estimate the standard error of parameters. When sample is indeed separable, removing estimation with abnormal sigma, the estimation of alpha is quite well. I used this method to explore the subgroup structure of HIV patients and it can be used in other domains where exists subgroup structure.
Maximum likelihood estimation, random values generation, density computation and other functions for the exponential-Poisson generalised exponential-Poisson and Poisson-exponential distributions. References include: Rodrigues G. C., Louzada F. and Ramos P. L. (2018). "Poisson-exponential distribution: different methods of estimation". Journal of Applied Statistics, 45(1): 128--144. <doi:10.1080/02664763.2016.1268571>. Louzada F., Ramos, P. L. and Ferreira, H. P. (2020). "Exponential-Poisson distribution: estimation and applications to rainfall and aircraft data with zero occurrence". Communications in Statistics--Simulation and Computation, 49(4): 1024--1043. <doi:10.1080/03610918.2018.1491988>. Barreto-Souza W. and Cribari-Neto F. (2009). "A generalization of the exponential-Poisson distribution". Statistics and Probability Letters, 79(24): 2493--2500. <doi:10.1016/j.spl.2009.09.003>.
NanoString nCounter data are gene expression assays where there is no need for the use of enzymes or amplification protocols and work with fluorescent barcodes (Geiss et al. (2018) <doi:10.1038/nbt1385>). Each barcode is assigned a messenger-RNA/micro-RNA (mRNA/miRNA) which after bonding with its target can be counted. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA. NACHO (NAnoString quality Control dasHbOard) is able to analyse the exported NanoString nCounter data and facilitates the user in performing a quality control. NACHO does this by visualising quality control metrics, expression of control genes, principal components and sample specific size factors in an interactive web application.
Makes it easy to build panel data in wide format from Panel Survey of Income Dynamics (PSID) delivered raw data. Downloads data directly from the PSID server using the SAScii package. psidR takes care of merging data from each wave onto a cross-period index file, so that individuals can be followed over time. The user must specify which years they are interested in, and the PSID variable names (e.g. ER21003) for each year (they differ in each year). The package offers helper functions to retrieve variable names from different waves. There are different panel data designs and sample subsetting criteria implemented ("SRC", "SEO", "immigrant" and "latino" samples). More information about the PSID can be obtained at <https://simba.isr.umich.edu/data/data.aspx>.
Runs the eDITH (environmental DNA Integrating Transport and Hydrology) model, which implements a mass balance of environmental DNA (eDNA) transport at a river network scale coupled with a species distribution model to obtain maps of species distribution. eDITH can work with both eDNA concentration (e.g., obtained via quantitative polymerase chain reaction) or metabarcoding (read count) data. Parameter estimation can be performed via Bayesian techniques (via the BayesianTools package) or optimization algorithms. An interface to the DHARMa package for posterior predictive checks is provided. See Carraro and Altermatt (2024) <doi:10.1111/2041-210X.14317> for a package introduction; Carraro et al. (2018) <doi:10.1073/pnas.1813843115> and Carraro et al. (2020) <doi:10.1038/s41467-020-17337-8> for methodological details.
This package provides a non-parametric framework based on estimation statistics principle. Its main purpose is to infer orders of empirical distributions from different categories based on a probability of finding a value in one distribution that is greater than an expectation of another distribution. Given a set of ordered-pair of real-category values the framework is capable of 1) inferring orders of domination of categories and representing orders in the form of a graph; 2) estimating magnitude of difference between a pair of categories in forms of mean-difference confidence intervals; and 3) visualizing domination orders and magnitudes of difference of categories. The publication of this package is at Chainarong Amornbunchornvej, Navaporn Surasvadi, Anon Plangprasopchok, and Suttipong Thajchayapong (2020) <doi:10.1016/j.heliyon.2020.e05435>.
Full Consistency Method (FUCOM) for multi-criteria decision-making (MCDM), developed by Dragam Pamucar in 2018 (<doi:10.3390/sym10090393>). The goal of the method is to determine the weights of criteria such that the deviation from full consistency is minimized. Users provide a character vector specifying the ranking of each criterion according to its significance, starting from the criterion expected to have the highest weight to the least significant one. Additionally, users provide a numeric vector specifying the priority values for each criterion. The comparison is made with respect to the first-ranked (most significant) criterion. The function returns the optimized weights for each criterion (summing to 1), the comparative priority (Phi) values, the mathematical transitivity condition (w) value, and the minimum deviation from full consistency (DFC).
Estimates statistically significant marker combination values within which one immunologically distinctive group (i.e., disease case) is more associated than another group (i.e., healthy control), successively, using various combinations (i.e., "gates") of markers to examine features of cells that may be different between groups. For a two-group comparison, the gateR package uses the spatial relative risk function estimated using the sparr package. Details about the sparr package methods can be found in the tutorial: Davies et al. (2018) <doi:10.1002/sim.7577>. Details about kernel density estimation can be found in J. F. Bithell (1990) <doi:10.1002/sim.4780090616>. More information about relative risk functions using kernel density estimation can be found in J. F. Bithell (1991) <doi:10.1002/sim.4780101112>.
Fits a generalized linear density ratio model (GLDRM). A GLDRM is a semiparametric generalized linear model. In contrast to a GLM, which assumes a particular exponential family distribution, the GLDRM uses a semiparametric likelihood to estimate the reference distribution. The reference distribution may be any discrete, continuous, or mixed exponential family distribution. The model parameters, which include both the regression coefficients and the cdf of the unspecified reference distribution, are estimated by maximizing a semiparametric likelihood. Regression coefficients are estimated with no loss of efficiency, i.e. the asymptotic variance is the same as if the true exponential family distribution were known. Huang (2014) <doi:10.1080/01621459.2013.824892>. Huang and Rathouz (2012) <doi:10.1093/biomet/asr075>. Rathouz and Gao (2008) <doi:10.1093/biostatistics/kxn030>.
An implementation of several machine learning algorithms for multivariate time series. The package includes functions allowing the execution of clustering, classification or outlier detection methods, among others. It also incorporates a collection of multivariate time series datasets which can be used to analyse the performance of new proposed algorithms. Some of these datasets are stored in GitHub data packages ueadata1 to ueadata8'. To access these data packages, run install.packages(c('ueadata1', ueadata2', ueadata3', ueadata4', ueadata5', ueadata6', ueadata7', ueadata8'), repos='<https://anloor7.github.io/drat/>')'. The installation takes a couple of minutes but we strongly encourage the users to do it if they want to have available all datasets of mlmts. Practitioners from a broad variety of fields could benefit from the general framework provided by mlmts'.
This package implements (1) panel cointegration rank tests, (2) estimators for panel vector autoregressive (VAR) models, and (3) identification methods for panel structural vector autoregressive (SVAR) models as described in the accompanying vignette. The implemented functions allow to account for cross-sectional dependence and for structural breaks in the deterministic terms of the VAR processes. Among the large set of functions, particularly noteworthy are those that implement (1) the correlation-augmented inverse normal test on the cointegration rank by Arsova and Oersal (2021, <doi:10.1016/j.ecosta.2020.05.002>), (2) the two-step estimator for pooled cointegrating vectors by Breitung (2005, <doi:10.1081/ETC-200067895>), and (3) the pooled identification based on independent component analysis by Herwartz and Wang (2024, <doi:10.1002/jae.3044>).