This package provides a novel forward stepwise discriminant analysis framework that integrates Pillai's trace with Uncorrelated Linear Discriminant Analysis (ULDA), providing an improvement over traditional stepwise LDA methods that rely on Wilks Lambda. A stand-alone ULDA implementation is also provided, offering a more general solution than the one available in the MASS package. It automatically handles missing values and provides visualization tools. For more details, see Wang (2024) <doi:10.48550/arXiv.2409.03136>
.
Several functions that allow by different methods to infer a piecewise polynomial regression model under regularity constraints, namely continuity or differentiability of the link function. The implemented functions are either specific to data with two regimes, or generic for any number of regimes, which can be given by the user or learned by the algorithm. A paper describing all these methods will be submitted soon. The reference will be added to this file as soon as available.
This package provides implementation of various correlation coefficients of common use in Information Retrieval. In particular, it includes Kendall (1970, isbn:0852641990) tau coefficient as well as tau_a and tau_b for the treatment of ties. It also includes Yilmaz et al. (2008) <doi:10.1145/1390334.1390435> tauAP
correlation coefficient, and versions tauAP_a
and tauAP_b
developed by Urbano and Marrero (2017) <doi:10.1145/3121050.3121106> to cope with ties.
These functions take a gene expression value matrix, a primary covariate vector, an additional known covariates matrix. A two stage analysis is applied to counter the effects of latent variables on the rankings of hypotheses. The estimation and adjustment of latent effects are proposed by Sun, Zhang and Owen (2011). "leapp" is developed in the context of microarray experiments, but may be used as a general tool for high throughput data sets where dependence may be involved.
The functions are designed to find the efficient mean-variance frontier or portfolio weights for static portfolio (called Markowitz portfolio) analysis in resource economics or nature conservation. Using the nonlinear programming solver ('Rsolnp'), this package deals with the quadratic minimization of the variance-covariances without shorting (i.e., non-negative portfolio weights) studied in Ando and Mallory (2012) <doi:10.1073/pnas.1114653109>. See the examples, testing versions, and more details from: <https://github.com/ysd2004/portn>.
Programs for analyzing large-scale time series data. They include functions for automatic specification and estimation of univariate time series, for clustering time series, for multivariate outlier detections, for quantile plotting of many time series, for dynamic factor models and for creating input data for deep learning programs. Examples of using the package can be found in the Wiley book Statistical Learning with Big Dependent Data by Daniel Peña and Ruey S. Tsay (2021). ISBN 9781119417385.
This is a core implementation of SAS procedures for linear models - GLM, REG, ANOVA, TTEST, FREQ, and UNIVARIATE. Some R packages provide type II and type III SS. However, the results of nested and complex designs are often different from those of SAS. Different results does not necessarily mean incorrectness. However, many wants the same results to SAS. This package aims to achieve that. Reference: Littell RC, Stroup WW, Freund RJ (2002, ISBN:0-471-22174-0).
R implementation of the software tools developed in the H-CUP (Healthcare Cost and Utilization Project) <https://www.hcup-us.ahrq.gov> and AHRQ (Agency for Healthcare Research and Quality) <https://www.ahrq.gov>. It currently contains functions for mapping ICD-9 codes to the AHRQ comorbidity measures and translating ICD-9 (resp. ICD-10) codes to ICD-10 (resp. ICD-9) codes based on GEM (General Equivalence Mappings) from CMS (Centers for Medicare and Medicaid Services).
Simple tabulation should be dead simple. This package is an opinionated approach to easy tabulations while also providing exact numbers and allowing for re-usability. This is achieved by providing tabulations as data.frames with columns for values, optional variable names, frequency counts including and excluding NAs and percentages for counts including and excluding NAs. Also values are automatically sorted by in decreasing order of frequency counts to allow for fast skimming of the most important information.
The package aims to identify miRNA
sponge or ceRNA
modules in heterogeneous data. It provides several functions to study miRNA
sponge modules at single-sample and multi-sample levels, including popular methods for inferring gene modules (candidate miRNA
sponge or ceRNA
modules), and two functions to identify miRNA
sponge modules at single-sample and multi-sample levels, as well as several functions to conduct modular analysis of miRNA
sponge modules.
This package provides qualitative methods for the validation of dynamic models. It contains
an orthogonal set of deviance measures for absolute, relative and ordinal scale and
approaches accounting for time shifts.
The first approach transforms time to take time delays and speed differences into account. The second divides the time series into interval units according to their main features and finds the longest common subsequence (LCS) using a dynamic programming algorithm.
This is a package supporting cluster analysis for cognitive diagnosis based on the Asymptotic Classification Theory (Chiu, Douglas & Li, 2009; doi:10.1007/s11336-009-9125-0). Given the sample statistic of sum-scores, cluster analysis techniques can be used to classify examinees into latent classes based on their attribute patterns. In addition to the algorithms used to classify data, three labeling approaches are proposed to label clusters so that examinees' attribute profiles can be obtained.
An implementation of the model in Steorts (2015) <DOI:10.1214/15-BA965SI>, which performs Bayesian entity resolution for categorical and text data, for any distance function defined by the user. In addition, the precision and recall are in the package to allow one to compare to any other comparable method such as logistic regression, Bayesian additive regression trees (BART), or random forests. The experiments are reproducible and illustrated using a simple vignette. LICENSE: GPL-3 + file license.
Estimates the Concordance Correlation Coefficient to assess agreement. The scenarios considered are non-repeated measures, non-longitudinal repeated measures (replicates) and longitudinal repeated measures. It also includes the estimation of the one-way intraclass correlation coefficient also known as reliability index. The estimation approaches implemented are variance components and U-statistics approaches. Description of methods can be found in Fleiss (1986) <doi:10.1002/9781118032923> and Carrasco et al. (2013) <doi:10.1016/j.cmpb.2012.09.002>.
Analysis of Fluorescence Recovery After Photobleaching (FRAP) experiments using nonlinear mixed-effects regression models and analysis of the results. FRApp is not limited to the analysis of FRAP experiments only. Any nonlinear mixed-effects models with an asymptotic exponential functional relationship to hierarchical data in various domains can be fitted. The analysis of data available in the package is presented in Di Credico, G., Pelucchi, S., Pauli, F. et al. (2025) <doi:10.1038/s41598-025-87154-w>.
Set of functions designed to solve inverse problems. The direct problem is used to calculate a cost function to be minimized. Here are listed some papers using Inverse Problems solvers and sensitivity analysis: (Jader Lugon Jr.; Antonio J. Silva Neto 2011) <doi:10.1590/S1678-58782011000400003>. (Jader Lugon Jr.; Antonio J. Silva Neto; Pedro P.G.W. Rodrigues 2008) <doi:10.1080/17415970802082864>. (Jader Lugon Jr.; Antonio J. Silva Neto; Cesar C. Santana 2008) <doi:10.1080/17415970802082922>.
This is a C++ mutual information (MI) library based on the k-nearest neighbor (KNN) algorithm. There are three functions provided for computing MI for continuous values, mixed continuous and discrete values, and conditional MI for continuous values. They are based on algorithms by A. Kraskov, et. al. (2004) <doi:10.1103/PhysRevE.69.066138>
, BC Ross (2014)<doi:10.1371/journal.pone.0087357>, and A. Tsimpiris (2012) <doi:10.1016/j.eswa.2012.05.014>, respectively.
We implement a surrogate modeling algorithm to guide simulation-based sample size planning. The method is described in detail in our paper (Zimmer & Debelak (2023) <doi:10.1037/met0000611>). It supports multiple study design parameters and optimization with respect to a cost function. It can find optimal designs that correspond to a desired statistical power or that fulfill a cost constraint. We also provide a tutorial paper (Zimmer et al. (2023) <doi:10.3758/s13428-023-02269-0>).
Estimates the sample size needed to detect microbial contamination in a lot with a user-specified detection probability and user-specified analytical sensitivity. Various patterns of microbial contamination are accounted for: homogeneous (Poisson), heterogeneous (Poisson-Gamma) or localized(Zero-inflated Poisson). Ida Jongenburger et al. (2010) <doi:10.1016/j.foodcont.2012.02.004> "Impact of microbial distributions on food safety". Leroy Simon (1963) <doi:10.1017/S0515036100001975> "Casualty Actuarial Society - The Negative Binomial and Poisson Distributions Compared".
An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The tokenization and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example workflows and information about the utilities offered in the package.
In causal mediation analysis with multiple causally ordered mediators, a set of path-specific effects are identified under standard ignorability assumptions. This package implements an imputation approach to estimating these effects along with a set of bias formulas for conducting sensitivity analysis (Zhou and Yamamoto <doi:10.31235/osf.io/2rx6p>). It contains two main functions: paths()
for estimating path-specific effects and sens()
for conducting sensitivity analysis. Estimation uncertainty is quantified using the nonparametric bootstrap.
This package provides tools for penalised maximum likelihood estimation of hidden semi-Markov models (HSMMs) with flexible state dwell-time distributions. These include functions for model fitting, model checking and state-decoding. The package considers HSMMs for univariate time series with state-dependent gamma, normal, Poisson or Bernoulli distributions. For details, see Pohle, J., Adam, T. and Beumer, L.T. (2021): Flexible estimation of the state dwell-time distribution in hidden semi-Markov models. <arXiv:2101.09197>
.
Computes noncompartmental pharmacokinetic parameters for drug concentration profiles. For each profile, data imputations and adjustments are made as necessary and basic parameters are estimated. Supports single dose, multi-dose, and multi-subject data. Supports steady-state calculations and various routes of drug administration. See ?qpNCA
and vignettes. Methodology follows Rowland and Tozer (2011, ISBN:978-0-683-07404-8), Gabrielsson and Weiner (1997, ISBN:978-91-9765-100-4), and Gibaldi and Perrier (1982, ISBN:978-0824710422).
Highest posterior model is widely accepted as a good model among available models. In terms of variable selection highest posterior model is often the true model. Our stochastic search process SAHPM based on simulated annealing maximization method tries to find the highest posterior model by maximizing the model space with respect to the posterior probabilities of the models. This package currently contains the SAHPM method only for linear models. The codes for GLM will be added in future.