This package provides interface to the MATLAB toolbox Flexible Statistical Data Analysis (FSDA) which is comprehensive and computationally efficient software package for robust statistics in regression, multivariate and categorical data analysis. The current R version implements tools for regression: (forward search, S- and MM-estimation, least trimmed squares (LTS) and least median of squares (LMS)), for multivariate analysis (forward search, S- and MM-estimation), for cluster analysis and cluster-wise regression. The distinctive feature of our package is the possibility of monitoring the statistics of interest as a function of breakdown point, efficiency or subset size, depending on the estimator. This is accompanied by a rich set of graphical features, such as dynamic brushing, linking, particularly useful for exploratory data analysis.
This package provides a model that provides researchers with a powerful tool for the classification and study of native corn by aiding in the identification of racial complexes which are fundamental to Mexico's agriculture and culture. This package has been developed based on data collected by "Proyecto Global de Maà ces Nativos México", which has conducted exhaustive surveys across the country to document the qualitative and quantitative characteristics of different types of native maize. The trained model uses a robust and diverse dataset, enabling it to achieve an 80% accuracy in classifying maize racial complexes. The characteristics included in the analysis comprise geographic location, grain and cob colors, as well as various physical measurements, such as lengths and widths.
The age is estimated by calculating the Dirichlet Normal Energy (DNE) on the whole auricular surface and the apex of the auricular surface. It involves three estimation methods: principal component discriminant analysis (PCQDA), and principal component logistic regression analysis (PCLR) methods, principal component regression analysis with Southeast Asian (A_PCR), and principal component regression analysis with multipopulation (M_PCR). The package is created with the data from the Louis Lopes Collection in Lisbon, the 21st Century Identified Human Remains Collection in Coimbra, and the CAL Milano Cemetery Skeletal Collection in Milan, and the skeletal collection at Khon Kaen University (KKU) Human Skeletal Research Centre (HSRC), housed in the Department of Anatomy in the Faculty of Medicine at KKU in Khon Kaen.
Fits Stable Isotope Mixing Models (SIMMs) and is meant as a longer term replacement to the previous widely-used package SIAR. SIMMs are used to infer dietary proportions of organisms consuming various food sources from observations on the stable isotope values taken from the organisms tissue samples. However SIMMs can also be used in other scenarios, such as in sediment mixing or the composition of fatty acids. The main functions are simmr_load() and simmr_mcmc(). The two vignettes contain a quick start and a full listing of all the features. The methods used are detailed in the papers Parnell et al 2010 <doi:10.1371/journal.pone.0009672>, and Parnell et al 2013 <doi:10.1002/env.2221>.
Run mixed-effects models that include weights at every level. The WeMix package fits a weighted mixed model, also known as a multilevel, mixed, or hierarchical linear model (HLM). The weights could be inverse selection probabilities, such as those developed for an education survey where schools are sampled probabilistically, and then students inside of those schools are sampled probabilistically. Although mixed-effects models are already available in R, WeMix is unique in implementing methods for mixed models using weights at multiple levels. Both linear and logit models are supported. Models may have up to three levels. Random effects are estimated using the PIRLS algorithm from lme4pureR (Walker and Bates (2013) <https://github.com/lme4/lme4pureR>).
Includes the reference implementation of Genie - a hierarchical clustering algorithm that links two point groups in such a way that an inequity measure (namely, the Gini index) of the cluster sizes does not significantly increase above a given threshold. This method most often outperforms many other data segmentation approaches in terms of clustering quality as tested on a wide range of benchmark datasets. At the same time, Genie retains the high speed of the single linkage approach, therefore it is also suitable for analysing larger data sets. For more details see (Gagolewski et al. 2016 <DOI:10.1016/j.ins.2016.05.003>). For an even faster and more feature-rich implementation, including, amongst others, noise point detection, see the genieclust package.
This package provides a Low Rank Correction Variational Bayesian algorithm for high-dimensional multi-source heterogeneous quantile linear models. More details have been written up in a paper submitted to the journal Statistics in Medicine, and the details of variational Bayesian methods can be found in Ray and Szabo (2021) <doi:10.1080/01621459.2020.1847121>. It simultaneously performs parameter estimation and variable selection. The algorithm supports two model settings: (1) local models, where variable selection is only applied to homogeneous coefficients, and (2) global models, where variable selection is also performed on heterogeneous coefficients. Two forms of parameter estimation are output: one is the standard variational Bayesian estimation, and the other is the variational Bayesian estimation corrected with low-rank adjustment.
Data and utilities for estimating pediatric blood pressure percentiles by sex, age, and optionally height (stature) as described in Martin et.al. (2022) <doi:10.1001/jamanetworkopen.2022.36918>. Blood pressure percentiles for children under one year of age come from Gemelli et.al. (1990) <doi:10.1007/BF02171556>. Estimates of blood pressure percentiles for children at least one year of age are informed by data from the National Heart, Lung, and Blood Institute (NHLBI) and the Centers for Disease Control and Prevention (CDC) <doi:10.1542/peds.2009-2107C> or from Lo et.al. (2013) <doi:10.1542/peds.2012-1292>. The flowchart for selecting the informing data source comes from Martin et.al. (2022) <doi:10.1542/hpeds.2021-005998>.
An efficient tool for fitting nested mixture models based on a shared set of atoms via Markov Chain Monte Carlo and variational inference algorithms. Specifically, the package implements the common atoms model (Denti et al., 2023), its finite version (similar to D'Angelo et al., 2023), and a hybrid finite-infinite model (D'Angelo and Denti, 2024). All models implement univariate nested mixtures with Gaussian kernels equipped with a normal-inverse gamma prior distribution on the parameters. Additional functions are provided to help analyze the results of the fitting procedure. References: Denti, Camerlenghi, Guindani, Mira (2023) <doi:10.1080/01621459.2021.1933499>, Dâ Angelo, Canale, Yu, Guindani (2023) <doi:10.1111/biom.13626>, Dâ Angelo, Denti (2024) <doi:10.1214/24-BA1458>.
This package provides tools for creating and working with survey replicate weights, extending functionality of the survey package from Lumley (2004) <doi:10.18637/jss.v009.i08>. Implements bootstrap methods for complex surveys, including the generalized survey bootstrap as described by Beaumont and Patak (2012) <doi:10.1111/j.1751-5823.2011.00166.x>. Methods are provided for applying nonresponse adjustments to both full-sample and replicate weights as described by Rust and Rao (1996) <doi:10.1177/096228029600500305>. Implements methods for sample-based calibration described by Opsomer and Erciulescu (2021) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2021002/article/00006-eng.htm>. Diagnostic functions are included to compare weights and weighted estimates from different sets of replicate weights.
Implementation of SPECS, your favourite Single-Equation Penalized Error-Correction Selector developed in Smeekes and Wijler (2021) <doi:10.1016/j.jeconom.2020.07.021>. SPECS provides a fully automated estimation procedure for large and potentially (co)integrated datasets. The dataset in levels is converted to a conditional error-correction model, either by the user or by means of the functions included in this package, and various specialised forms of penalized regression can be applied to the model. Automated options for initializing and selecting a sequence of penalties, as well as the construction of penalty weights via an initial estimator, are available. Moreover, the user may choose from a number of pre-specified deterministic configurations to further simplify the model building process.
The _CAGEr_ package identifies transcription start sites (TSS) and their usage frequency from CAGE (Cap Analysis Gene Expression) sequencing data. It normalises raw CAGE tag count, clusters TSSs into tag clusters (TC) and aggregates them across multiple CAGE experiments to construct consensus clusters (CC) representing the promoterome. CAGEr provides functions to profile expression levels of these clusters by cumulative expression and rarefaction analysis, and outputs the plots in ggplot2 format for further facetting and customisation. After clustering, CAGEr performs analyses of promoter width and detects differential usage of TSSs (promoter shifting) between samples. CAGEr also exports its data as genome browser tracks, and as R objects for downsteam expression analysis by other Bioconductor packages such as DESeq2, CAGEfightR, or seqArchR.
Construction and smart selection of Gaussian process models for analysis of computer experiments with emphasis on treatment of functional inputs that are regularly sampled. This package offers: (i) flexible modeling of functional-input regression problems through the fairly general Gaussian process model; (ii) built-in dimension reduction for functional inputs; (iii) heuristic optimization of the structural parameters of the model (e.g., active inputs, kernel function, type of distance). An in-depth tutorial in the use of funGp is provided in Betancourt et al. (2024) <doi:10.18637/jss.v109.i05> and Metamodeling background is provided in Betancourt et al. (2020) <doi:10.1016/j.ress.2020.106870>. The algorithm for structural parameter optimization is described in <https://hal.science/hal-02532713>.
Includes the ga.lts() function that estimates LTS (Least Trimmed Squares) parameters using genetic algorithms and C-steps. ga.lts() constructs a genetic algorithm to form a basic subset and iterates C-steps as defined in Rousseeuw and van-Driessen (2006) to calculate the cost value of the LTS criterion. OLS (Ordinary Least Squares) regression is known to be sensitive to outliers. A single outlying observation can change the values of estimated parameters. LTS is a resistant estimator even the number of outliers is up to half of the data. This package is for estimating the LTS parameters with lower bias and variance in a reasonable time. Version >=1.3 includes the function medmad for fast outlier detection in linear regression.
Structure and formatting requirements for clinical trial table and listing outputs vary between pharmaceutical companies. junco provides additional tooling for use alongside the rtables', rlistings and tern packages when creating table and listing outputs. While motivated by the specifics of Johnson and Johnson Clinical and Statistical Programming's table and listing shells, junco provides functionality that is general and reusable. Major features include a) alternative and extended statistical analyses beyond what tern supports for use in standard safety and efficacy tables, b) a robust production-grade Rich Text Format (RTF) exporter for both tables and listings, c) structural support for spanning column headers and risk difference columns in tables, and d) robust font-aware automatic column width algorithms for both listings and tables.
The MARSS package provides maximum-likelihood parameter estimation for constrained and unconstrained linear multivariate autoregressive state-space (MARSS) models, including partially deterministic models. MARSS models are a class of dynamic linear model (DLM) and vector autoregressive model (VAR) model. Fitting available via Expectation-Maximization (EM), BFGS (using optim), and TMB (using the marssTMB companion package). Functions are provided for parametric and innovations bootstrapping, Kalman filtering and smoothing, model selection criteria including bootstrap AICb, confidences intervals via the Hessian approximation or bootstrapping, and all conditional residual types. See the user guide for examples of dynamic factor analysis, dynamic linear models, outlier and shock detection, and multivariate AR-p models. Online workshops (lectures, eBook, and computer labs) at <https://atsa-es.github.io/>.
Identifying maturation stages across young athletes is paramount for talent identification. Furthermore, the concept of biobanding, or grouping of athletes based on their biological development, instead of their chronological age, has been widely researched. The goal of this package is to help professionals working in the field of strength & conditioning and talent ID obtain common maturation metrics and as well as to quickly visualize this information via several plotting options. For the methods behind the computed maturation metrics implemented in this package refer to Khamis, H. J., & Roche, A. F. (1994) <https://pubmed.ncbi.nlm.nih.gov/7936860/>, Mirwald, R.L et al., (2002) <https://pubmed.ncbi.nlm.nih.gov/11932580/> and Cumming, Sean P. et al., (2017) <doi:10.1519/SSC.0000000000000281>.
Implementations of several methods for principal component analysis using the L1 norm. The package depends on COIN-OR Clp version >= 1.17.4. The methods implemented are PCA-L1 (Kwak 2008) <DOI:10.1109/TPAMI.2008.114>, L1-PCA (Ke and Kanade 2003, 2005) <DOI:10.1109/CVPR.2005.309>, L1-PCA* (Brooks, Dula, and Boone 2013) <DOI:10.1016/j.csda.2012.11.007>, L1-PCAhp (Visentin, Prestwich and Armagan 2016) <DOI:10.1007/978-3-319-46227-1_37>, wPCA (Park and Klabjan 2016) <DOI: 10.1109/ICDM.2016.0054>, awPCA (Park and Klabjan 2016) <DOI: 10.1109/ICDM.2016.0054>, PCA-Lp (Kwak 2014) <DOI:10.1109/TCYB.2013.2262936>, and SharpEl1-PCA (Brooks and Dula, submitted).
The developed package can be used to generate a spatial population for different levels of relationships among the dependent and auxiliary variables along with spatially varying model parameters. A spatial layout is designed as a [0,k-1]x[0,k-1] square region on which observations are collected at (k x k) lattice points with a unit distance between any two neighbouring points along the horizontal and vertical axes. For method details see Chao, Liu., Chuanhua, Wei. and Yunan, Su. (2018).<doi:10.1080/10485252.2018.1499907>. The generated spatial population can be utilized in Geographically Weighted Regression model based analysis for studying the spatially varying relationships among the variables. Furthermore, various statistical analysis can be performed on this spatially generated data.
Implementation of functions for fitting taper curves (a semiparametric linear mixed effects taper model) to diameter measurements along stems. Further functions are provided to estimate the uncertainty around the predicted curves, to calculate timber volume (also by sections) and marginal (e.g., upper) diameters. For cases where tree heights are not measured, methods for estimating additional variance in volume predictions resulting from uncertainties in tree height models (tariffs) are provided. The example data include the taper curve parameters for Norway spruce used in the 3rd German NFI fitted to 380 trees and a subset of section-wise diameter measurements of these trees. The functions implemented here are detailed in Kublin, E., Breidenbach, J., Kaendler, G. (2013) <doi:10.1007/s10342-013-0715-0>.
dinoR tests for significant differences in NOMe-seq footprints between two conditions, using genomic regions of interest (ROI) centered around a landmark, for example a transcription factor (TF) motif. This package takes NOMe-seq data (GCH methylation/protection) in the form of a Ranged Summarized Experiment as input. dinoR can be used to group sequencing fragments into 3 or 5 categories representing characteristic footprints (TF bound, nculeosome bound, open chromatin), plot the percentage of fragments in each category in a heatmap, or averaged across different ROI groups, for example, containing a common TF motif. It is designed to compare footprints between two sample groups, using edgeR's quasi-likelihood methods on the total fragment counts per ROI, sample, and footprint category.
Genome wide studies of translational control is emerging as a tool to study various biological conditions. The output from such analysis is both the mRNA level (e.g. cytosolic mRNA level) and the level of mRNA actively involved in translation (the actively translating mRNA level) for each mRNA. The standard analysis of such data strives towards identifying differential translational between two or more sample classes - i.e., differences in actively translated mRNA levels that are independent of underlying differences in cytosolic mRNA levels. This package allows for such analysis using partial variances and the random variance model. As 10s of thousands of mRNAs are analyzed in parallel the library performs a number of tests to assure that the data set is suitable for such analysis.
This package provides a computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed tree-structured regression models into a well defined theory of conditional inference procedures. This non-parametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman's random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing tree-structured regression models is available.
This package provides a set of psychometric tools for cognitive diagnosis modeling based on the generalized deterministic inputs, noisy and gate (G-DINA) model by de la Torre (2011) doi:10.1007/s11336-011-9207-7 and its extensions, including the sequential G-DINA model by Ma and de la Torre (2016) doi:10.1111/bmsp.12070 for polytomous responses, and the polytomous G-DINA model by Chen and de la Torre doi:10.1177/0146621613479818 for polytomous attributes. Joint attribute distribution can be independent, saturated, higher-order, loglinear smoothed or structured. Q-matrix validation, item and model fit statistics, model comparison at test and item level and differential item functioning can also be conducted. A graphical user interface is also provided.