dinoR tests for significant differences in NOMe-seq footprints between two conditions, using genomic regions of interest (ROI) centered around a landmark, for example a transcription factor (TF) motif. This package takes NOMe-seq data (GCH methylation/protection) in the form of a Ranged Summarized Experiment as input. dinoR can be used to group sequencing fragments into 3 or 5 categories representing characteristic footprints (TF bound, nculeosome bound, open chromatin), plot the percentage of fragments in each category in a heatmap, or averaged across different ROI groups, for example, containing a common TF motif. It is designed to compare footprints between two sample groups, using edgeR's quasi-likelihood methods on the total fragment counts per ROI, sample, and footprint category.
Calculates concentration and dispersion in ordered rating scales. It implements various measures of concentration and dispersion to describe what researchers variably call agreement, concentration, consensus, dispersion, or polarization among respondents in ordered data. It also implements other related measures to classify distributions. In addition to a generic city-block based concentration measure and a generic dispersion measure, the package implements various measures, including van der Eijk's (2001) <DOI: 10.1023/A:1010374114305> measure of agreement A, measures of concentration by Leik, Tatsle and Wierman, Blair and Lacy, Kvalseth, Berry and Mielke, Reardon, and Garcia-Montalvo and Reynal-Querol. Furthermore, the package provides an implementation of Galtungs AJUS-system to classify distributions, as well as a function to identify the position of multiple modes.
This package provides a genome-wide survival framework that integrates sequential conditional independent tuples and saddlepoint approximation method, to provide SNP-level false discovery rate control while improving power, particularly for biobank-scale survival analyses with low event rates. The method is based on model-X knockoffs as described in Barber and Candes (2015) <doi:10.1214/15-AOS1337> and fast survival analysis methods from Bi et al. (2020) <doi:10.1016/j.ajhg.2020.06.003>. A shrinkage algorithmic leveraging accelerates multiple knockoffs generation in large genetic cohorts. This CRAN version uses standard Cox regression for association testing. For enhanced performance on very large datasets, users may optionally install the SPACox package from GitHub which provides saddlepoint approximation methods for survival analysis.
Approximate marginal maximum likelihood estimation of multidimensional latent variable models via adaptive quadrature or Laplace approximations to the integrals in the likelihood function, as presented for confirmatory factor analysis models in Jin, S., Noh, M., and Lee, Y. (2018) <doi:10.1080/10705511.2017.1403287>, for item response theory models in Andersson, B., and Xin, T. (2021) <doi:10.3102/1076998620945199>, and for generalized linear latent variable models in Andersson, B., Jin, S., and Zhang, M. (2023) <doi:10.1016/j.csda.2023.107710>. Models implemented include the generalized partial credit model, the graded response model, and generalized linear latent variable models for Poisson, negative-binomial and normal distributions. Supports a combination of binary, ordinal, count and continuous observed variables and multiple group models.
This package implements empirical Bayes approaches to genotype polyploids from next generation sequencing data while accounting for allele bias, overdispersion, and sequencing error. The main functions are flexdog() and multidog(), which allow the specification of many different genotype distributions. Also provided are functions to simulate genotypes, rgeno(), and read-counts, rflexdog(), as well as functions to calculate oracle genotyping error rates, oracle_mis(), and correlation with the true genotypes, oracle_cor(). These latter two functions are useful for read depth calculations. Run browseVignettes(package = "updog") in R for example usage. See Gerard et al. (2018) <doi:10.1534/genetics.118.301468> and Gerard and Ferrao (2020) <doi:10.1093/bioinformatics/btz852> for details on the implemented methods.
This package provides a local execution environment for testing and developing the FaaSr workflows without requiring cloud infrastructure. The FaaSr package enables R developers to validate and test workflows locally before deploying to Function-as-a-Service (FaaS) platforms. Key features include: 1) Parsing and validating JSON workflow configurations compliant with the FaaSr schema 2) Simulated S3 storage operations using local file system with local logging 3) Support for conditional branching 4) Support for parallel rank functions execution 5) Workflow cycle detection and validation 6) No cloud credentials or infrastructure required for testing This package is designed for development and testing purposes. For production deployment to cloud FaaS platforms, use the main FaaSr package available at <https://faasr.io/>.
This package provides likelihood-based inference for Gaussian and Student-t copula models for univariate count time series. Supports Poisson, negative binomial, binomial, beta-binomial, and zero-inflated marginals with ARMA dependence structures. Includes simulation, maximum-likelihood estimation, residual diagnostics, and predictive inference. Implements Time Series Minimax Exponential Tilting (TMET) <doi:10.1016/j.csda.2026.108344>, an adaptation of minimax exponential tilting of Botev (2017) <doi:10.1111/rssb.12162>. Also provides a linear-cost implementation of the Gewekeâ Hajivassiliouâ Keane (GHK) simulator following Masarotto and Varin (2012) <doi:10.1214/12-EJS721>, and the Continuous Extension (CE) approximation of Nguyen and De Oliveira (2025) <doi:10.1080/02664763.2025.2498502>. The package follows the S3 design philosophy of gcmr but is developed independently.
The implement of integrative analysis methods based on a two-part penalization, which realizes dimension reduction analysis and mining the heterogeneity and association of multiple studies with compatible designs. The software package provides the integrative analysis methods including integrative sparse principal component analysis (Fang et al., 2018), integrative sparse partial least squares (Liang et al., 2021) and integrative sparse canonical correlation analysis, as well as corresponding individual analysis and meta-analysis versions. References: (1) Fang, K., Fan, X., Zhang, Q., and Ma, S. (2018). Integrative sparse principal component analysis. Journal of Multivariate Analysis, <doi:10.1016/j.jmva.2018.02.002>. (2) Liang, W., Ma, S., Zhang, Q., and Zhu, T. (2021). Integrative sparse partial least squares. Statistics in Medicine, <doi:10.1002/sim.8900>.
Bayesian supervised predictive classifiers, hypothesis testing, and parametric estimation under Partition Exchangeability are implemented. The two classifiers presented are the marginal classifier (that assumes test data is i.i.d.) next to a more computationally costly but accurate simultaneous classifier (that finds a labelling for the entire test dataset at once based on simultanous use of all the test data to predict each label). We also provide the Maximum Likelihood Estimation (MLE) of the only underlying parameter of the partition exchangeability generative model as well as hypothesis testing statistics for equality of this parameter with a single value, alternative, or multiple samples. We present functions to simulate the sequences from Ewens Sampling Formula as the realisation of the Poisson-Dirichlet distribution and their respective probabilities.
This package provides a sensitivity analysis approach for unmeasured confounding in observational data with multiple treatments and a binary outcome. This approach derives the general bias formula and provides adjusted causal effect estimates in response to various assumptions about the degree of unmeasured confounding. Nested multiple imputation is embedded within the Bayesian framework to integrate uncertainty about the sensitivity parameters and sampling variability. Bayesian Additive Regression Model (BART) is used for outcome modeling. The causal estimands are the conditional average treatment effects (CATE) based on the risk difference. For more details, see paper: Hu L et al. (2020) A flexible sensitivity analysis approach for unmeasured confounding with multiple treatments and a binary outcome with application to SEER-Medicare lung cancer data <arXiv:2012.06093>.
Analysis and visualization of dropout between conditions in surveys and (online) experiments. Features include computation of dropout statistics, comparing dropout between conditions (e.g. Chi squared), analyzing survival (e.g. Kaplan-Meier estimation), comparing conditions with the most different rates of dropout (Kolmogorov-Smirnov) and visualizing the result of each in designated plotting functions. Article published in _Behavior Research Methods_ on dropR by the authors: Dropout analysis: A method for data from Internet-based research and dropR', an R-based web app and package to analyze and visualize dropout. (2025) <doi:10.3758/s13428-025-02730-2>. Sources: Andrea Frick, Marie-Terese Baechtiger & Ulf-Dietrich Reips (2001) <doi:10.5167/uzh-19758>; Ulf-Dietrich Reips (2002) <doi:10.1026//1618-3169.49.4.243>.
This package performs Diffusion Non-Additive (DNA) model proposed by Heo, Boutelet, and Sung (2025+) <doi:10.48550/arXiv.2506.08328> for multi-fidelity computer experiments with tuning parameters. The DNA model captures nonlinear dependencies across fidelity levels using Gaussian process priors and is particularly effective when simulations at different fidelity levels are nonlinearly correlated. The DNA model targets not only interpolation across given fidelity levels but also extrapolation to smaller tuning parameters including the exact solution corresponding to a zero-valued tuning parameter, leveraging a nonseparable covariance kernel structure that models interactions between the tuning parameter and input variables. Closed-form expressions for the predictive mean and variance enable efficient inference and uncertainty quantification. Hyperparameters in the model are estimated via maximum likelihood estimation.
It provides a method based on EM algorithm to estimate the parameter of a mixture model, Sigmoid-Normal Model, where the samples come from several normal distributions (also call them subgroups) whose mean is determined by co-variable Z and coefficient alpha while the variance are homogeneous. Meanwhile, the subgroup each item belongs to is determined by co-variables X and coefficient eta through Sigmoid link function which is the extension of Logistic Link function. It uses bootstrap to estimate the standard error of parameters. When sample is indeed separable, removing estimation with abnormal sigma, the estimation of alpha is quite well. I used this method to explore the subgroup structure of HIV patients and it can be used in other domains where exists subgroup structure.
This package implements a basic version of the hierarchical clustering algorithm Genie which links two point groups in such a way that an inequity measure (namely, the Gini index) of the cluster sizes does not significantly increase above a given threshold. This method most often outperforms many other data segmentation approaches in terms of clustering quality as tested on a wide range of benchmark datasets. At the same time, Genie retains the high speed of the single linkage approach, therefore it is also suitable for analysing larger data sets. For more details see (Gagolewski et al. 2016 <DOI:10.1016/j.ins.2016.05.003>). For a faster and more feature-rich implementation, see the genieclust package (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>).
Maximum likelihood estimation, random values generation, density computation and other functions for the exponential-Poisson generalised exponential-Poisson and Poisson-exponential distributions. References include: Rodrigues G. C., Louzada F. and Ramos P. L. (2018). "Poisson-exponential distribution: different methods of estimation". Journal of Applied Statistics, 45(1): 128--144. <doi:10.1080/02664763.2016.1268571>. Louzada F., Ramos, P. L. and Ferreira, H. P. (2020). "Exponential-Poisson distribution: estimation and applications to rainfall and aircraft data with zero occurrence". Communications in Statistics--Simulation and Computation, 49(4): 1024--1043. <doi:10.1080/03610918.2018.1491988>. Barreto-Souza W. and Cribari-Neto F. (2009). "A generalization of the exponential-Poisson distribution". Statistics and Probability Letters, 79(24): 2493--2500. <doi:10.1016/j.spl.2009.09.003>.
This package implements Multivariable Functional Mendelian Randomization (MV-FMR) to estimate time-varying causal effects of multiple longitudinal exposures on health outcomes. Extends univariable functional Mendelian Randomisation (MR) (Tian et al., 2024 <doi:10.1002/sim.10222>) to the multivariable setting, enabling joint estimation of multiple time-varying exposures with pleiotropy and mediation scenarios. Key features include: (1) data-driven cross-validation for basis component selection, (2) handling of mediation pathways between exposures, (3) support for both continuous and binary outcomes using Generalized Method of Moments (GMM) and control function approaches, (4) one-sample and two-sample MR designs, (5) bootstrap inference and instrument diagnostics including Q-statistics for overidentification testing. Methods are described in Fontana et al. (2025) <doi:10.48550/arXiv.2512.19064>.
NanoString nCounter data are gene expression assays where there is no need for the use of enzymes or amplification protocols and work with fluorescent barcodes (Geiss et al. (2018) <doi:10.1038/nbt1385>). Each barcode is assigned a messenger-RNA/micro-RNA (mRNA/miRNA) which after bonding with its target can be counted. As a result each count of a specific barcode represents the presence of its target mRNA/miRNA. NACHO (NAnoString quality Control dasHbOard) is able to analyse the exported NanoString nCounter data and facilitates the user in performing a quality control. NACHO does this by visualising quality control metrics, expression of control genes, principal components and sample specific size factors in an interactive web application.
This package provides smooth approximations to the L0 norm penalty for estimating sparse Gaussian graphical models (GGMs). Network estimation is performed using the Local Linear Approximation (LLA) framework (Fan & Li, 2001 <doi:10.1198/016214501753382273>; Zou & Li, 2008 <doi:10.1214/009053607000000802>) with five penalty functions: arctangent (Wang & Zhu, 2016 <doi:10.1155/2016/6495417>), EXP (Wang, Fan, & Zhu, 2018 <doi:10.1007/s10463-016-0588-3>), Gumbel, Log (Candes, Wakin, & Boyd, 2008 <doi:10.1007/s00041-008-9045-x>), and Weibull. Adaptive penalty parameters for EXP, Gumbel, and Weibull are estimated via maximum likelihood, and model selection uses information criteria including AIC, BIC, and EBIC (Extended BIC). Simulation functions generate multivariate normal data from GGMs with stochastic block model or small-world (Watts-Strogatz) network structures.
Makes it easy to build panel data in wide format from Panel Survey of Income Dynamics (PSID) delivered raw data. Downloads data directly from the PSID server using the SAScii package. psidR takes care of merging data from each wave onto a cross-period index file, so that individuals can be followed over time. The user must specify which years they are interested in, and the PSID variable names (e.g. ER21003) for each year (they differ in each year). The package offers helper functions to retrieve variable names from different waves. There are different panel data designs and sample subsetting criteria implemented ("SRC", "SEO", "immigrant" and "latino" samples). More information about the PSID can be obtained at <https://simba.isr.umich.edu/data/data.aspx>.
There are three main goals to the vctrs package:
To propose
vec_size()andvec_type()as alternatives tolength()andclass(). These definitions are paired with a framework for type-coercion and size-recycling.To define type- and size-stability as desirable function properties, use them to analyse existing base function, and to propose better alternatives. This work has been particularly motivated by thinking about the ideal properties of
c(),ifelse(), andrbind().To provide a new
vctrbase class that makes it easy to create new S3 vectors.vctrsprovides methods for many base generics in terms of a few newvctrsgenerics, making implementation considerably simpler and more robust.
TENET identifies key transcription factors (TFs) and regulatory elements (REs) linked to a specific cell type by finding significantly correlated differences in gene expression and RE DNA methylation between case and control input datasets, and identifying the top genes by number of significant RE DNA methylation site links. It also includes many tools for visualization and analysis of the results, including plots displaying and comparing methylation and expression data and methylation site link counts, survival analysis, TF motif searching in the vicinity of linked RE DNA methylation sites, custom TAD and peak overlap analysis, and UCSC Genome Browser track file generation. A utility function is also provided to download methylation, expression, and patient survival data from The Cancer Genome Atlas (TCGA) for use in TENET or other analyses.
Runs the eDITH (environmental DNA Integrating Transport and Hydrology) model, which implements a mass balance of environmental DNA (eDNA) transport at a river network scale coupled with a species distribution model to obtain maps of species distribution. eDITH can work with both eDNA concentration (e.g., obtained via quantitative polymerase chain reaction) or metabarcoding (read count) data. Parameter estimation can be performed via Bayesian techniques (via the BayesianTools package) or optimization algorithms. An interface to the DHARMa package for posterior predictive checks is provided. See Carraro and Altermatt (2024) <doi:10.1111/2041-210X.14317> for a package introduction; Carraro et al. (2018) <doi:10.1073/pnas.1813843115> and Carraro et al. (2020) <doi:10.1038/s41467-020-17337-8> for methodological details.
This package provides a non-parametric framework based on estimation statistics principle. Its main purpose is to infer orders of empirical distributions from different categories based on a probability of finding a value in one distribution that is greater than an expectation of another distribution. Given a set of ordered-pair of real-category values the framework is capable of 1) inferring orders of domination of categories and representing orders in the form of a graph; 2) estimating magnitude of difference between a pair of categories in forms of mean-difference confidence intervals; and 3) visualizing domination orders and magnitudes of difference of categories. The publication of this package is at Chainarong Amornbunchornvej, Navaporn Surasvadi, Anon Plangprasopchok, and Suttipong Thajchayapong (2020) <doi:10.1016/j.heliyon.2020.e05435>.
Full Consistency Method (FUCOM) for multi-criteria decision-making (MCDM), developed by Dragam Pamucar in 2018 (<doi:10.3390/sym10090393>). The goal of the method is to determine the weights of criteria such that the deviation from full consistency is minimized. Users provide a character vector specifying the ranking of each criterion according to its significance, starting from the criterion expected to have the highest weight to the least significant one. Additionally, users provide a numeric vector specifying the priority values for each criterion. The comparison is made with respect to the first-ranked (most significant) criterion. The function returns the optimized weights for each criterion (summing to 1), the comparative priority (Phi) values, the mathematical transitivity condition (w) value, and the minimum deviation from full consistency (DFC).