Multi modality data matrices are factorized conjointly into the multiplication of a shared sub-matrix and multiple modality specific sub-matrices, group sparse constraint is applied to the shared sub-matrix to capture the homogeneous and heterogeneous information, respectively. Then the samples are classified by clustering the shared sub-matrix with kmeanspp(), a new version of kmeans() developed here to obtain concordant results. The package also provides the cluster number estimation by rotation cost. Moreover, cluster specific features could be retrieved using hypergeometric tests.
Multivariate estimation and testing, currently a package for testing parametric data. To deal with parametric data, various multivariate normality tests and outlier detection are performed and visualized using the ggplot2 package. Homogeneity tests for covariance matrices are also possible, as well as the Hotelling's T-square test and the multivariate analysis of variance test. We are exploring additional tests and visualization techniques, such as profile analysis and randomized complete block design, to be made available in the future and making them easily accessible to users.
This package provides tools for the practical management of financial portfolios: backtesting investment and trading strategies, computing profit/loss and returns, analysing trades, handling lists of transactions, reporting, and more. The package provides a small set of reliable, efficient and convenient tools for processing and analysing trade/portfolio data. The manual provides all the details; it is available from <https://enricoschumann.net/R/packages/PMwR/manual/PMwR.html>. Examples and descriptions of new features are provided at <https://enricoschumann.net/notes/PMwR/>.
Create, transform, and summarize custom random variables with distribution functions (analogues of p*()', d*()', q*()', and r*() functions from base R). Two types of distributions are supported: "discrete" (random variable has finite number of output values) and "continuous" (infinite number of values in the form of continuous random variable). Functions for distribution transformations and summaries are available. Implemented approaches often emphasize approximate and numerical solutions: all distributions assume finite support and finite values of density function; some methods implemented with simulation techniques.
Calculates the Probability Plot Correlation Coefficient (PPCC) between a continuous variable X and a specified distribution. The corresponding composite hypothesis test that was first introduced by Filliben (1975) <doi: 10.1080/00401706.1975.10489279> can be performed to test whether the sample X is element of either the Normal, log-Normal, Exponential, Uniform, Cauchy, Logistic, Generalized Logistic, Gumbel (GEVI), Weibull, Generalized Extreme Value, Pearson III (Gamma 2), Mielke's Kappa, Rayleigh or Generalized Logistic Distribution. The PPCC test is performed with a fast Monte-Carlo simulation.
The aim of the package is to provide some basic functions for doing statistics with one dimensional Fuzzy Data (in the form of polygonal fuzzy numbers). In particular, the package contains functions for the basic operations on the class of fuzzy numbers (sum, scalar product, mean, median, Hukuhara difference) as well as for calculating (Bertoluzza) distance and sample variance. Moreover a function to simulate fuzzy random variables and bootstrap tests for the equality of means is included. Version 2.1 fixes some bugs of previous versions.
Specialized toolkit for processing biological and fisheries data from Peru's anchovy (Engraulis ringens) fishery. Provides functions to analyze fishing logbooks, calculate biological indicators (length-weight relationships, juvenile percentages), generate spatial fishing indicators, and visualize regulatory measures from Peru's Ministry of Production. Features automated data processing from multiple file formats, coordinate validation, spatial analysis of fishing zones, and tools for analyzing fishing closure announcements and regulatory compliance. Includes built-in datasets of Peruvian coastal coordinates and parallel lines for analyzing fishing activities within regulatory zones.
The TEQR package contains software to calculate the operating characteristics for the TEQR and the ACT designs.The TEQR (toxicity equivalence range) design is a toxicity based cumulative cohort design with added safety rules. The ACT (Activity constrained for toxicity) design is also a cumulative cohort design with additional safety rules. The unique feature of this design is that dose is escalated based on lack of activity rather than on lack of toxicity and is de-escalated only if an unacceptable level of toxicity is experienced.
Implementation of the tree-guided feature selection and logic aggregation approach introduced in Chen et al. (2024) <doi:10.1080/01621459.2024.2326621>. The method enables the selection and aggregation of large-scale rare binary features with a known hierarchical structure using a convex, linearly-constrained regularized regression framework. The package facilitates the application of this method to both linear regression and binary classification problems by solving the optimization problem via the smoothing proximal gradient descent algorithm (Chen et al. (2012) <doi:10.1214/11-AOAS514>).
The main purpose of this package is to provide the algorithmic complexity for short strings, an approximation of the Kolmogorov Complexity of a short string using the coding theorem method. While the database containing the complexity is provided in the data only package acss.data, this package provides functions accessing the data such as prob_random returning the posterior probability that a given string was produced by a random process. In addition, two traditional (but problematic) measures of complexity are also provided: entropy and change complexity.
RNA abundance and cell size parameters could improve RNA-seq deconvolution algorithms to more accurately estimate cell type proportions given the different cell type transcription activity levels. A Total RNA Expression Gene (TREG) can facilitate estimating total RNA content using single molecule fluorescent in situ hybridization (smFISH). We developed a data-driven approach using a measure of expression invariance to find candidate TREGs in postmortem human brain single nucleus RNA-seq. This R package implements the method for identifying candidate TREGs from snRNA-seq data.
This package provides tools to fit Bayesian state-space models to animal tracking data. Models are provided for location filtering, location filtering and behavioural state estimation, and their hierarchical versions. The models are primarily intended for fitting to ARGOS satellite tracking data but options exist to fit to other tracking data types. For Global Positioning System data, consider the moveHMM package. Simplified Markov Chain Monte Carlo convergence diagnostic plotting is provided but users are encouraged to explore tools available in packages such as coda and boa'.
This package performs regression analysis for longitudinal count data, allowing for serial dependence among observations from a given individual and two dimensional random effects on the linear predictor. Estimation is via maximization of the exact likelihood of a suitably defined model. Missing values and unbalanced data are allowed. Details can be found in the accompanying scientific papers: Goncalves & Cabral (2021, Journal of Statistical Software, <doi:10.18637/jss.v099.i03>) and Goncalves et al. (2007, Computational Statistics & Data Analysis, <doi:10.1016/j.csda.2007.03.002>).
This package provides a distributed framework for simulating and estimating skew factor models under various skewed and heavy-tailed distributions. The methods support distributed data generation, aggregation of local estimators, and evaluation of estimation performance via mean squared error, relative error, and sparsity measures. The distributed principal component (PC) estimators implemented in the package include IPC (Independent Principal Component),'PPC (Project Principal Component), SPC (Sparse Principal Component), and other related distributed PC methods. The methodological background follows Guo G. (2023) <doi:10.1007/s00180-022-01270-z>.
Dose Titration Algorithm Tuning (DTAT) is a methodologic framework allowing dose individualization to be conceived as a continuous learning process that begins in early-phase clinical trials and continues throughout drug development, on into clinical practice. This package includes code that researchers may use to reproduce or extend key results of the DTAT research programme, plus tools for trialists to design and simulate a 3+3/PC dose-finding study. Please see Norris (2017a) <doi:10.12688/f1000research.10624.3> and Norris (2017c) <doi:10.1101/240846>.
This package provides methods to "add" two R tables; also an alternative interpretation of named vectors as generalized R tables, so that c(a=1,b=2,c=3) + c(b=3,a=-1) will return c(b=5,c=3). Uses disordR discipline (Hankin, 2022, <doi:10.48550/arXiv.2210.03856>). Extraction and replacement methods are provided. The underlying mathematical structure is the Free Abelian group, hence the name. To cite in publications please use Hankin (2023) <doi:10.48550/arXiv.2307.13184>.
To help you access, transform, analyze, and visualize ForestGEO data, we developed a collection of R packages (<https://forestgeo.github.io/fgeo/>). This package, in particular, helps you to install and load the entire package-collection with a single R command, and provides convenient ways to find relevant documentation. Most commonly, you should not worry about the individual packages that make up the package-collection as you can access all features via this package. To learn more about ForestGEO visit <http://www.forestgeo.si.edu/>.
This package implements an algorithm for fitting a generative model with an intractable likelihood using only box constraints on the parameters. The implemented algorithm consists of two phases. The first phase (global search) aims to identify the region containing the best solution, while the second phase (local search) refines this solution using a trust-region version of the Fisher scoring method to solve a quasi-likelihood equation. See Guido Masarotto (2025) <doi:10.48550/arXiv.2511.08180> for the details of the algorithm and supporting results.
Fit a full or subsampling bagging survival tree on a mixture of population (susceptible and nonsusceptible) using either a pseudo R2 criterion or an adjusted Logrank criterion. The predictor is evaluated using the Out Of Bag Integrated Brier Score (IBS) and several scores of importance are computed for variable selection. The thresholds values for variable selection are computed using a nonparametric permutation test. See Cyprien Mbogning and Philippe Broet (2016)<doi:10.1186/s12859-016-1090-x> for an overview about the methods implemented in this package.
This package implements proper and so-called Maximum Likelihood Multiple Imputation as described by von Hippel and Bartlett (2021) <doi:10.1214/20-STS793>. A number of different imputation methods are available, by utilising the norm', cat and mix packages. Inferences can be performed either using Rubin's rules (for proper imputation), or a modified version for maximum likelihood imputation. For maximum likelihood imputations a likelihood score based approach based on theory by Wang and Robins (1998) <doi:10.1093/biomet/85.4.935> is also available.
This package contains model-based treatment of missing data for regression models with missing values in covariates or the dependent variable using maximum likelihood or Bayesian estimation (Ibrahim et al., 2005; <doi:10.1198/016214504000001844>; Luedtke, Robitzsch, & West, 2020a, 2020b; <doi:10.1080/00273171.2019.1640104><doi:10.1037/met0000233>). The regression model can be nonlinear (e.g., interaction effects, quadratic effects or B-spline functions). Multilevel models with missing data in predictors are available for Bayesian estimation. Substantive-model compatible multiple imputation can be also conducted.
We consider the network structure detection for variables Y with auxiliary variables X accommodated, which are possibly subject to measurement error. The following three functions are designed to address various structures by different methods : one is NP_Graph() that is used for handling the nonlinear relationship between the responses and the covariates, another is Joint_Gaussian() that is used for correction in linear regression models via the Gaussian maximum likelihood, and the other Cond_Gaussian() is for linear regression models via conditional likelihood function.
Implementation of prediction and inference procedures for Synthetic Control methods using least square, lasso, ridge, or simplex-type constraints. Uncertainty is quantified with prediction intervals as developed in Cattaneo, Feng, and Titiunik (2021) <doi:10.1080/01621459.2021.1979561> for a single treated unit and in Cattaneo, Feng, Palomba, and Titiunik (2025) <doi:10.1162/rest_a_01588> for multiple treated units and staggered adoption. More details about the software implementation can be found in Cattaneo, Feng, Palomba, and Titiunik (2025) <doi:10.18637/jss.v113.i01>.
Access and manipulate spatial tracking data, with straightforward coercion from and to other formats. Filter for speed and create time spent maps from tracking data. There are coercion methods to convert between trip and ltraj from adehabitatLT', and between trip and psp and ppp from spatstat'. Trip objects can be created from raw or grouped data frames, and from types in the sp', sf', amt', trackeR', mousetrap', and other packages, Sumner, MD (2011) <https://figshare.utas.edu.au/articles/thesis/The_tag_location_problem/23209538>.