This package contains a single function dclust()
for divisive hierarchical clustering based on recursive k-means partitioning (k = 2). Useful for clustering large datasets where computation of a n x n distance matrix is not feasible (e.g. n > 10,000 records). For further information see Steinbach, Karypis and Kumar (2000) <http://glaros.dtc.umn.edu/gkhome/fetch/papers/docclusterKDDTMW00.pdf>
.
Dynamic treatment regime estimation and inference via G-estimation, dynamic weighted ordinary least squares (dWOLS
) and Q-learning. Inference via bootstrap and recursive sandwich estimation. Estimation and inference for survival outcomes via Dynamic Weighted Survival Modeling (DWSurv). Extension to continuous treatment variables. Wallace et al. (2017) <DOI:10.18637/jss.v080.i02>; Simoneau et al. (2020) <DOI:10.1080/00949655.2020.1793341>.
This package provides a collection of datasets essential for functional genomic analysis. Gene names, gene positions, cytoband information, sourced from Ensembl and phenotypes association graph prepared from GWAScatalog are included. Data is available in both GRCh37 and 38 builds. These datasets facilitate a wide range of genomic studies, including the identification of genetic variants, exploration of genomic features, and post-GWAS functional analysis.
Generalized LassO
applied to knot selection in multivariate B-splinE
Regression (GLOBER) implements a novel approach for estimating functions in a multivariate nonparametric regression model based on an adaptive knot selection for B-splines using the Generalized Lasso. For further details we refer the reader to the paper Savino, M. E. and Lévy-Leduc, C. (2023), <arXiv:2306.00686>
.
An approach to analyzing Likert response items, with an emphasis on visualizations. The stacked bar plot is the preferred method for presenting Likert results. Tabular results are also implemented along with density plots to assist researchers in determining whether Likert responses can be used quantitatively instead of qualitatively. See the likert()
, summary.likert()
, and plot.likert()
functions to get started.
The proposed method aims at predicting the longitudinal mean response trajectory by a kernel-based estimator. The kernel estimator is constructed by imposing weights based on subject-wise similarity on L2 metric space between predictor trajectories as well as time proximity. Users could also perform variable selections to derive functional predictors with predictive significance by the proposed multiplicative model with multivariate Gaussian kernels.
Complete analytical environment for the construction and analysis of matrix population models and integral projection models. Includes the ability to construct historical matrices, which are 2d matrices comprising 3 consecutive times of demographic information. Estimates both raw and function-based forms of historical and standard ahistorical matrices. It also estimates function-based age-by-stage matrices and raw and function-based Leslie matrices.
Implementation of hypothesis testing procedures described in Hansen (1992) <doi:10.1002/jae.3950070506>, Carrasco, Hu, & Ploberger (2014) <doi:10.3982/ECTA8609>, Dufour & Luger (2017) <doi:10.1080/07474938.2017.1307548>, and Rodriguez Rondon & Dufour (2024) <https://grodriguezrondon.com/files/RodriguezRondon_Dufour_2024_MonteCarlo_LikelihoodRatioTest_MarkovSwitchingModels_20241015.pdf>
that can be used to identify the number of regimes in Markov switching models.
Generate maximum projection (MaxPro
) designs for quantitative and/or qualitative factors. Details of the MaxPro
criterion can be found in: (1) Joseph, Gul, and Ba. (2015) "Maximum Projection Designs for Computer Experiments", Biometrika, 102, 371-380, and (2) Joseph, Gul, and Ba. (2018) "Designing Computer Experiments with Multiple Types of Factors: The MaxPro
Approach", Journal of Quality Technology, to appear.
Additive proportional odds model for ordinal data using Laplace P-splines. The combination of Laplace approximations and P-splines enable fast and flexible inference in a Bayesian framework. Specific approximations are proposed to account for the asymmetry in the marginal posterior distributions of non-penalized parameters. For more details, see Lambert and Gressani (2023) <doi:10.1177/1471082X231181173> ; Preprint: <arXiv:2210.01668>
).
Tool for producing Pen's parade graphs, useful for visualizing inequalities in income, wages or other variables, as proposed by Pen (1971, ISBN: 978-0140212594). Income or another economic variable is captured by the vertical axis, while the population is arranged in ascending order of income along the horizontal axis. Pen's income parades provide an easy-to-interpret visualization of economic inequalities.
This package provides tools for modeling non-continuous linear responses of ecological communities to environmental data. The package is straightforward through three steps: (1) data ordering (function OrdData()
), (2) split-moving-window analysis (function SMW()
) and (3) piecewise redundancy analysis (function pwRDA()
). Relevant references include Cornelius and Reynolds (1991) <doi:10.2307/1941559> and Legendre and Legendre (2012, ISBN: 9780444538697).
This package provides functions to perform split robust least angle regression. The approach first uses the least angle regression algorithm to split the variables into the models of an ensemble and robust estimates of the correlation between predictors. An elastic net estimator is then applied to the selected predictors in each model using the imputed data from the detect deviating cell (DDC) method.
The goal of SAFEPG is to predict climate-related extreme losses by fitting a frequency-severity model. It improves predictive performance by introducing a sign-aligned regularization term, which ensures consistent signs for the coefficients across the frequency and severity components. This enhancement not only increases model accuracy but also enhances its interpretability, making it more suitable for practical applications in risk assessment.
This package provides a Tcl/Tk Graphical User Interface (GUI) to display images than can be zoomed and panned using the mouse and keyboard shortcuts. tkImgR
read and write different image formats (PPM/PGM, PNG and GIF) using the standard Tcl/Tk distribution (>=8.6), but other formats (JPEG, TIFF, CR2) can be handled using the tkImg
package for Tcl/Tk'.
This package provides functions for defining and conducting a time series prediction process including pre(post)processing, decomposition, modelling, prediction and accuracy assessment. The generated models and its yielded prediction errors can be used for benchmarking other time series prediction methods and for creating a demand for the refinement of such methods. For this purpose, benchmark data from prediction competitions may be used.
This package infers the V genotype of an individual from immunoglobulin (Ig) repertoire sequencing data (AIRR-Seq, Rep-Seq). Includes detection of any novel alleles. This information is then used to correct existing V allele calls from among the sample sequences. Citations: Gadala-Maria, et al (2015) <doi:10.1073/pnas.1417683112>, Gadala-Maria, et al (2019) <doi:10.3389/fimmu.2019.00129>.
The vcfpp.h (<https://github.com/Zilong-Li/vcfpp>) provides an easy-to-use C++ API of htslib', offering full functionality for manipulating Variant Call Format (VCF) files. The vcfppR
package serves as the R bindings of the vcfpp.h library, enabling rapid processing of both compressed and uncompressed VCF files. Explore a range of powerful features for efficient VCF data manipulation.
An implementation of three procedures developed by John Tukey: FUNOP (FUll NOrmal Plot), FUNOR-FUNOM (FUll NOrmal Rejection-FUll NOrmal Modification), and vacuum cleaner. Combined, they provide a way to identify, treat, and analyze outliers in two-way (i.e., contingency) tables, as described in his landmark paper "The Future of Data Analysis", Tukey, John W. (1962) <https://www.jstor.org/stable/2237638>.
This is a package for fast image processing for images in up to 4 dimensions (two spatial dimensions, one time/depth dimension, one color dimension). It provides most traditional image processing tools (filtering, morphology, transformations, etc.) as well as various functions for easily analyzing image data using R. The package wraps CImg, a simple, modern C++ library for image processing.
Inverse normal transformation (INT) based genetic association testing. These tests are recommend for continuous traits with non-normally distributed residuals. INT-based tests robustly control the type I error in settings where standard linear regression does not, as when the residual distribution exhibits excess skew or kurtosis. Moreover, INT-based tests outperform standard linear regression in terms of power. These tests may be classified into two types. In direct INT (D-INT), the phenotype is itself transformed. In indirect INT (I-INT), phenotypic residuals are transformed. The omnibus test (O-INT) adaptively combines D-INT and I-INT into a single robust and statistically powerful approach. See McCaw
ZR, Lane JM, Saxena R, Redline S, Lin X. "Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies" <doi:10.1111/biom.13214>.
An R implementation for the Strain Elevation and Tension embedding algorithm from Bourne (2020) <doi:10.1007/s41109-020-00329-4>. The package embeds graphs and networks using the Strain Elevation and Tension embedding (SETSe) algorithm. SETSe represents the network as a physical system, where edges are elastic, and nodes exert a force either up or down based on node features. SETSe positions the nodes vertically such that the tension in the edges of a node is equal and opposite to the force it exerts for all nodes in the network. The resultant structure can then be analysed by looking at the node elevation and the edge strain and tension. This algorithm works on weighted and unweighted networks as well as networks with or without explicit node features. Edge elasticity can be created from existing edge weights or kept as a constant.
The agghoo procedure is an alternative to usual cross-validation. Instead of choosing the best model trained on V subsamples, it determines a winner model for each subsample, and then aggregates the V outputs. For the details, see "Aggregated hold-out" by Guillaume Maillard, Sylvain Arlot, Matthieu Lerasle (2021) <arXiv:1909.04890>
published in Journal of Machine Learning Research 22(20):1--55.
For a binary classification the adjusted sensitivity and specificity are measured for a given fixed threshold. If the threshold for either sensitivity or specificity is not given, the crossing point between the sensitivity and specificity curves are returned. For bootstrap procedures, mean and CI bootstrap values of sensitivity, specificity, crossing point between specificity and specificity as well as AUC and AUCPR can be evaluated.