This package provides Partial least squares Regression and various regular, sparse or kernel, techniques for fitting Cox models in high dimensional settings <doi:10.1093/bioinformatics/btu660>, Bastien, P., Bertrand, F., Meyer N., Maumy-Bertrand, M. (2015), Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data, Bioinformatics, 31(3):397-404. Cross validation criteria were studied in <arXiv:1810.02962>
, Bertrand, F., Bastien, Ph. and Maumy-Bertrand, M. (2018), Cross validating extensions of kernel, sparse or regular partial least squares regression models to censored data.
Calculates the periodogram of a time series, maximum-likelihood fits an Ornstein-Uhlenbeck state space (OUSS) null model and evaluates the statistical significance of periodogram peaks against the OUSS null hypothesis. The OUSS is a parsimonious model for stochastically fluctuating variables with linear stabilizing forces, subject to uncorrelated measurement errors. Contrary to the classical white noise null model for detecting cyclicity, the OUSS model can account for temporal correlations typically occurring in ecological and geological time series. Citation: Louca, Stilianos and Doebeli, Michael (2015) <doi:10.1890/14-0126.1>.
Inspired by space-time regressions often performed to assess the expansion of the Neolithic from the Near East to Europe (Pinhasi et al. 2005 <doi:10.1371/journal.pbio.0030410>). Test for significant correlations between the (earliest) radiocarbon dates of archaeological sites and their respective distances from a hypothetical center of origin. Both ordinary least squares (OLS) and reduced major axis (RMA) methods are supported (Russell et al. 2014 <doi:10.1371/journal.pone.0087854>). It is also possible to iterate over many sites to identify the most likely origin.
Many relevant applications in the environmental and socioeconomic sciences use areal data, such as biodiversity checklists, agricultural statistics, or socioeconomic surveys. For applications that surpass the spatial, temporal or thematic scope of any single data source, data must be integrated from several heterogeneous sources. Inconsistent concepts, definitions, or messy data tables make this a tedious and error-prone process. arealDB
tackles those problems and helps the user to integrate a harmonised databases of areal data. Read the paper at Ehrmann, Seppelt & Meyer (2020) <doi:10.1016/j.envsoft.2020.104799>.
This package provides functions to simulate data sets from hierarchical ecological models, including all the simulations described in the two volume publication Applied Hierarchical Modeling in Ecology: Analysis of distribution, abundance and species richness in R and BUGS by Marc Kéry and Andy Royle: volume 1 (2016, ISBN: 978-0-12-801378-6) and volume 2 (2021, ISBN: 978-0-12-809585-0), <https://www.mbr-pwrc.usgs.gov/pubanalysis/keryroylebook/>. It also has all the utility functions and data sets needed to replicate the analyses shown in the books.
Bayesian kernel machine regression (from the bkmr package) is a Bayesian semi-parametric generalized linear model approach under identity and probit links. There are a number of functions in this package that extend Bayesian kernel machine regression fits to allow multiple-chain inference and diagnostics, which leverage functions from the future', rstan', and coda packages. Reference: Bobb, J. F., Henn, B. C., Valeri, L., & Coull, B. A. (2018). Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. ; <doi:10.1186/s12940-018-0413-y>.
Quantitative characterization of the health impacts associated with exposure to chemical mixtures has received considerable attention in current environmental and epidemiological studies. CompMix
package allows practitioners to estimate the health impacts from exposure to chemical mixtures data through various statistical approaches, including Lasso, Elastic net, Bayeisan kernel machine regression (BKMR), hierNet
, Quantile g-computation, Weighted quantile sum (WQS) and Random forest. Hao W, Cathey A, Aung M, Boss J, Meeker J, Mukherjee B. (2024) "Statistical methods for chemical mixtures: a practitioners guide". <DOI:10.1101/2024.03.03.24303677>.
Clusters longitudinal trajectories over time (can be unequally spaced, unequal length time series and/or partially overlapping series) on a common time axis. Performs k-means clustering on a single continuous variable measured over time, where each mean is defined by a thin plate spline fit to all points in a cluster. Distance is MSE across trajectory points to cluster spline. Provides graphs of derived cluster splines, silhouette plots, and Adjusted Rand Index evaluations of the number of clusters. Scales well to large data with multicore parallelism available to speed computation.
This package implements a modern, unified estimation strategy for common mediation estimands (natural effects, organic effects, interventional effects, and recanting twins) in combination with modified treatment policies as described in Liu, Williams, Rudolph, and DÃ az (2024) <doi:10.48550/arXiv.2408.14620>
. Estimation makes use of recent advancements in Riesz-learning to estimate a set of required nuisance parameters with deep learning. The result is the capability to estimate mediation effects with binary, categorical, continuous, or multivariate exposures with high-dimensional mediators and mediator-outcome confounders using machine learning.
Estimation of distributed lag models (DLMs) based on a Bayesian additive regression trees framework. Includes several extensions of DLMs: treed DLMs and distributed lag mixture models (Mork and Wilson, 2023) <doi:10.1111/biom.13568>; treed distributed lag nonlinear models (Mork and Wilson, 2022) <doi:10.1093/biostatistics/kxaa051>; heterogeneous DLMs (Mork, et. al., 2024) <doi:10.1080/01621459.2023.2258595>; monotone DLMs (Mork and Wilson, 2024) <doi:10.1214/23-BA1412>. The package also includes visualization tools and a shiny interface to check model convergence and to help interpret results.
This package performs fragment analysis using genetic data coming from capillary electrophoresis machines. These are files with FSA extension which stands for FASTA-type file, and .txt files from Beckman CEQ 8000 system, both contain DNA fragment intensities read by machinery. In addition to visualization, it performs automatic scoring of SSRs (Sample Sequence Repeats; a type of genetic marker very common across the genome) and other type of PCR markers (standing for Polymerase Chain Reaction) in biparental populations such as F1, F2, BC (backcross), and diversity panels (collection of genetic diversity).
Allows the user to execute interactively radial data envelopment analysis models. The user has the ability to upload a data frame, select the input/output variables, choose the technology assumption to adopt and decide whether to run an input or an output oriented model. When the model is executed a set of results are displayed which include efficiency scores, peers determination, scale efficiencies evaluation and slacks calculation. Fore more information about the theoretical background of the package, please refer to Bogetoft & Otto (2011) <doi:10.1007/978-1-4419-7961-2>.
This package provides functions to assess the calibration of logistic regression models with the GiViTI
(Gruppo Italiano per la Valutazione degli interventi in Terapia Intensiva, Italian Group for the Evaluation of the Interventions in Intensive Care Units - see <http://www.giviti.marionegri.it/>) approach. The approach consists in a graphical tool, namely the GiViTI
calibration belt, and in the associated statistical test. These tools can be used both to evaluate the internal calibration (i.e. the goodness of fit) and to assess the validity of an externally developed model.
We implemented multiple tests based on the restricted mean time lost (RMTL) for general factorial designs as described in Munko et al. (2024) <doi:10.48550/arXiv.2409.07917>
. Therefore, an asymptotic test and a permutation test are incorporated with a Wald-type test statistic. The asymptotic test takes the asymptotic exact dependence structure of the test statistics into account to gain more power. Furthermore, confidence intervals for RMTL contrasts can be calculated and plotted and a stepwise extension that can improve the power of the multiple tests is available.
Assists researchers in choosing Key Opinion Leaders (KOLs) in a network to help disseminate or encourage adoption of an innovation by other network members. Potential KOL teams are evaluated using the ABCDE framework (Neal et al., 2025 <doi:10.31219/osf.io/3vxy9_v1>). This framework which considers: (1) the team members Availability, (2) the Breadth of the team's network coverage, (3) the Cost of recruiting a team of a given size, and (4) the Diversity of the team's members, (5) which are pooled into a single Evaluation score.
An interface to Neptune. A metadata store for MLOps, built for teams that run a lot of experiments. It gives you a single place to log, store, display, organize, compare, and query all your model-building metadata. Neptune is used for: â ¢ Experiment tracking: Log, display, organize, and compare ML experiments in a single place. â ¢ Model registry: Version, store, manage, and query trained models, and model building metadata. â ¢ Monitoring ML runs live: Record and monitor model training, evaluation, or production runs live For more information see <https://neptune.ai/>.
The functions allow for the numerical evaluation of some commonly used entropy measures, such as Shannon entropy, Rényi entropy, Havrda and Charvat entropy, and Arimoto entropy, at selected parametric values from several well-known and widely used probability distributions. Moreover, the functions also compute the relative loss of these entropies using the truncated distributions. Related works include: Awad, A. M., & Alawneh, A. J. (1987). Application of entropy to a life-time model. IMA Journal of Mathematical Control and Information, 4(2), 143-148. <doi:10.1093/imamci/4.2.143>.
It is a single cell active pathway analysis tool based on the graph neural network (F. Scarselli (2009) <doi:10.1109/TNN.2008.2005605>; Thomas N. Kipf (2017) <arXiv:1609.02907v4>
) to construct the gene-cell association network, infer pathway activity scores from different single cell modalities data, integrate multiple modality data on the same cells into one pathway activity score matrix, identify cell phenotype activated gene modules and parse association networks of gene modules under multiple cell phenotype. In addition, abundant visualization programs are provided to display the results.
Digital Expression Explorer 2 (or DEE2 for short) is a repository of processed RNA-seq data in the form of counts. It was designed so that researchers could undertake re-analysis and meta-analysis of published RNA-seq studies quickly and easily. As of April 2020, over 1 million SRA datasets have been processed. This package provides an R interface to access these expression data. More information about the DEE2 project can be found at the project homepage (http://dee2.io) and main publication (https://doi.org/10.1093/gigascience/giz022).
PathMED
is a collection of tools to facilitate precision medicine studies with omics data (e.g. transcriptomics). Among its funcionalities, genesets scores for individual samples may be calculated with several methods. These scores may be used to train machine learning models and to predict clinical features on new data. For this, several machine learning methods are evaluated in order to select the best method based on internal validation and to tune the hyperparameters. Performance metrics and a ready-to-use model to predict the outcomes for new patients are returned.
zitools allows for zero inflated count data analysis by either using down-weighting of excess zeros or by replacing an appropriate proportion of excess zeros with NA. Through overloading frequently used statistical functions (such as mean, median, standard deviation), plotting functions (such as boxplots or heatmap) or differential abundance tests, it allows a wide range of downstream analyses for zero-inflated data in a less biased manner. This becomes applicable in the context of microbiome analyses, where the data is often overdispersed and zero-inflated, therefore making data analysis extremly challenging.
This package provides a collection of convenient functions for common statistical computations, which are not directly provided by R's base
or stats
packages. This package aims at providing, first, shortcuts for statistical measures, which otherwise could only be calculated with additional effort. Second, these shortcut functions are generic, and can be applied not only to vectors, but also to other objects as well. The focus of most functions lies on summary statistics or fit measures for regression models, including generalized linear models, mixed effects models and Bayesian models.
Computes conditional multivariate t probabilities, random deviates, and densities. It can also be used to create missing values at random in a dataset, resulting in a missing at random (MAR) mechanism. Inbuilt in the package are the Expectation-Maximization (EM), Monte Carlo EM, and Stochastic EM algorithms for imputation of missing values in datasets assuming the multivariate t distribution. See Kinyanjui, Tamba, Orawo, and Okenye (2020)<doi:10.3233/mas-200493>, and Kinyanjui, Tamba, and Okenye(2021)<http://www.ceser.in/ceserp/index.php/ijamas/article/view/6726/0> for more details.
This package provides a set of functions to quantify the relationship between development rate and temperature and to build phenological models. The package comprises a set of models and estimated parameters borrowed from a literature review in ectotherms. The methods and literature review are described in Rebaudo et al. (2018) <doi:10.1111/2041-210X.12935>, Rebaudo and Rabhi (2018) <doi:10.1111/eea.12693>, and Regnier et al. (2021) <doi:10.1093/ee/nvab115>. An example can be found in Rebaudo et al. (2017) <doi:10.1007/s13355-017-0480-5>.