Bayesian kernel machine regression (from the bkmr package) is a Bayesian semi-parametric generalized linear model approach under identity and probit links. There are a number of functions in this package that extend Bayesian kernel machine regression fits to allow multiple-chain inference and diagnostics, which leverage functions from the future', rstan', and coda packages. Reference: Bobb, J. F., Henn, B. C., Valeri, L., & Coull, B. A. (2018). Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. ; <doi:10.1186/s12940-018-0413-y>.
Clusters longitudinal trajectories over time (can be unequally spaced, unequal length time series and/or partially overlapping series) on a common time axis. Performs k-means clustering on a single continuous variable measured over time, where each mean is defined by a thin plate spline fit to all points in a cluster. Distance is MSE across trajectory points to cluster spline. Provides graphs of derived cluster splines, silhouette plots, and Adjusted Rand Index evaluations of the number of clusters. Scales well to large data with multicore parallelism available to speed computation.
Quantitative characterization of the health impacts associated with exposure to chemical mixtures has received considerable attention in current environmental and epidemiological studies. CompMix package allows practitioners to estimate the health impacts from exposure to chemical mixtures data through various statistical approaches, including Lasso, Elastic net, Bayeisan kernel machine regression (BKMR), hierNet, Quantile g-computation, Weighted quantile sum (WQS) and Random forest. Hao W, Cathey A, Aung M, Boss J, Meeker J, Mukherjee B. (2024) "Statistical methods for chemical mixtures: a practitioners guide". <DOI:10.1101/2024.03.03.24303677>.
This package implements a modern, unified estimation strategy for common mediation estimands (natural effects, organic effects, interventional effects, and recanting twins) in combination with modified treatment policies as described in Liu, Williams, Rudolph, and DÃ az (2024) <doi:10.48550/arXiv.2408.14620>. Estimation makes use of recent advancements in Riesz-learning to estimate a set of required nuisance parameters with deep learning. The result is the capability to estimate mediation effects with binary, categorical, continuous, or multivariate exposures with high-dimensional mediators and mediator-outcome confounders using machine learning.
Estimation of distributed lag models (DLMs) based on a Bayesian additive regression trees framework. Includes several extensions of DLMs: treed DLMs and distributed lag mixture models (Mork and Wilson, 2023) <doi:10.1111/biom.13568>; treed distributed lag nonlinear models (Mork and Wilson, 2022) <doi:10.1093/biostatistics/kxaa051>; heterogeneous DLMs (Mork, et. al., 2024) <doi:10.1080/01621459.2023.2258595>; monotone DLMs (Mork and Wilson, 2024) <doi:10.1214/23-BA1412>. The package also includes visualization tools and a shiny interface to check model convergence and to help interpret results.
This package performs fragment analysis using genetic data coming from capillary electrophoresis machines. These are files with FSA extension which stands for FASTA-type file, and .txt files from Beckman CEQ 8000 system, both contain DNA fragment intensities read by machinery. In addition to visualization, it performs automatic scoring of SSRs (Sample Sequence Repeats; a type of genetic marker very common across the genome) and other type of PCR markers (standing for Polymerase Chain Reaction) in biparental populations such as F1, F2, BC (backcross), and diversity panels (collection of genetic diversity).
Allows the user to execute interactively radial data envelopment analysis models. The user has the ability to upload a data frame, select the input/output variables, choose the technology assumption to adopt and decide whether to run an input or an output oriented model. When the model is executed a set of results are displayed which include efficiency scores, peers determination, scale efficiencies evaluation and slacks calculation. Fore more information about the theoretical background of the package, please refer to Bogetoft & Otto (2011) <doi:10.1007/978-1-4419-7961-2>.
This package provides functions to assess the calibration of logistic regression models with the GiViTI (Gruppo Italiano per la Valutazione degli interventi in Terapia Intensiva, Italian Group for the Evaluation of the Interventions in Intensive Care Units - see <http://www.giviti.marionegri.it/>) approach. The approach consists in a graphical tool, namely the GiViTI calibration belt, and in the associated statistical test. These tools can be used both to evaluate the internal calibration (i.e. the goodness of fit) and to assess the validity of an externally developed model.
We implemented multiple tests based on the restricted mean time lost (RMTL) for general factorial designs as described in Munko et al. (2024) <doi:10.48550/arXiv.2409.07917>. Therefore, an asymptotic test and a permutation test are incorporated with a Wald-type test statistic. The asymptotic test takes the asymptotic exact dependence structure of the test statistics into account to gain more power. Furthermore, confidence intervals for RMTL contrasts can be calculated and plotted and a stepwise extension that can improve the power of the multiple tests is available.
Assists researchers in choosing Key Opinion Leaders (KOLs) in a network to help disseminate or encourage adoption of an innovation by other network members. Potential KOL teams are evaluated using the ABCDE framework (Neal et al., 2025 <doi:10.31219/osf.io/3vxy9_v1>). This framework which considers: (1) the team members Availability, (2) the Breadth of the team's network coverage, (3) the Cost of recruiting a team of a given size, and (4) the Diversity of the team's members, (5) which are pooled into a single Evaluation score.
An interface to Neptune. A metadata store for MLOps, built for teams that run a lot of experiments. It gives you a single place to log, store, display, organize, compare, and query all your model-building metadata. Neptune is used for: â ¢ Experiment tracking: Log, display, organize, and compare ML experiments in a single place. â ¢ Model registry: Version, store, manage, and query trained models, and model building metadata. â ¢ Monitoring ML runs live: Record and monitor model training, evaluation, or production runs live For more information see <https://neptune.ai/>.
The functions allow for the numerical evaluation of some commonly used entropy measures, such as Shannon entropy, Rényi entropy, Havrda and Charvat entropy, and Arimoto entropy, at selected parametric values from several well-known and widely used probability distributions. Moreover, the functions also compute the relative loss of these entropies using the truncated distributions. Related works include: Awad, A. M., & Alawneh, A. J. (1987). Application of entropy to a life-time model. IMA Journal of Mathematical Control and Information, 4(2), 143-148. <doi:10.1093/imamci/4.2.143>.
It is a single cell active pathway analysis tool based on the graph neural network (F. Scarselli (2009) <doi:10.1109/TNN.2008.2005605>; Thomas N. Kipf (2017) <arXiv:1609.02907v4>) to construct the gene-cell association network, infer pathway activity scores from different single cell modalities data, integrate multiple modality data on the same cells into one pathway activity score matrix, identify cell phenotype activated gene modules and parse association networks of gene modules under multiple cell phenotype. In addition, abundant visualization programs are provided to display the results.
This package provides a collection of functions designed for analyzing deconvolution of the bulk sample(s) using an atlas of reference omic signature profiles and a user-selected model. Users are given the option to create or extend a reference atlas and,also simulate the desired size of the bulk signature profile of the reference cell types. The package includes the cell-type-specific methylation atlas and, Illumina Epic B5 probe ids that can be used in deconvolution. Additionally, we included BSmeth2Probe, to make mapping WGBS data to their probe IDs easier.
This package implements importance sampling from the truncated multivariate normal using the Geweke-Hajivassiliou-Keane (GHK) simulator. Unlike Gibbs sampling which can get stuck in one truncation sub-region depending on initial values, this package allows truncation based on disjoint regions that are created by truncation of absolute values. The GHK algorithm uses simple Cholesky transformation followed by recursive simulation of univariate truncated normals hence there are also no convergence issues. Importance sample is returned along with sampling weights, based on which, one can calculate integrals over truncated regions for multivariate normals.
Computes conditional multivariate t probabilities, random deviates, and densities. It can also be used to create missing values at random in a dataset, resulting in a missing at random (MAR) mechanism. Inbuilt in the package are the Expectation-Maximization (EM), Monte Carlo EM, and Stochastic EM algorithms for imputation of missing values in datasets assuming the multivariate t distribution. See Kinyanjui, Tamba, Orawo, and Okenye (2020)<doi:10.3233/mas-200493>, and Kinyanjui, Tamba, and Okenye(2021)<http://www.ceser.in/ceserp/index.php/ijamas/article/view/6726/0> for more details.
Automatic generation of finite state machine models of dynamic decision-making that both have strong predictive power and are interpretable in human terms. We use an efficient model representation and a genetic algorithm-based estimation process to generate simple deterministic approximations that explain most of the structure of complex stochastic processes. We have applied the software to empirical data, and demonstrated it's ability to recover known data-generating processes by simulating data with agent-based models and correctly deriving the underlying decision models for multiple agent models and degrees of stochasticity.
This package provides a comprehensive toolkit for single-cell annotation with the CellMarker2.0 database (see Xia Li, Peng Wang, Yunpeng Zhang (2023) <doi: 10.1093/nar/gkac947>). Streamlines biological label assignment in single-cell RNA-seq data and facilitates transcriptomic analysis, including preparation of TCGA<https://portal.gdc.cancer.gov/> and GEO<https://www.ncbi.nlm.nih.gov/geo/> datasets, differential expression analysis and visualization of enrichment analysis results. Additional utility functions support various bioinformatics workflows. See Wei Cui (2024) <doi: 10.1101/2024.09.14.609619> for more details.
The Delphi Epidata API provides real-time access to epidemiological surveillance data for influenza, COVID-19', and other diseases for the USA at various geographical resolutions, both from official government sources such as the Center for Disease Control (CDC) and Google Trends and private partners such as Facebook and Change Healthcare'. It is built and maintained by the Carnegie Mellon University Delphi research group. To cite this API: David C. Farrow, Logan C. Brooks, Aaron Rumack', Ryan J. Tibshirani', Roni Rosenfeld (2015). Delphi Epidata API. <https://github.com/cmu-delphi/delphi-epidata>.
This package implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.
Fast binning of multiple variables using parallel processing. A summary of all the variables binned is generated which provides the information value, entropy, an indicator of whether the variable follows a monotonic trend or not, etc. It supports rebinning of variables to force a monotonic trend as well as manual binning based on pre specified cuts. The cut points of the bins are based on conditional inference trees as implemented in the partykit package. The conditional inference framework is described by Hothorn T, Hornik K, Zeileis A (2006) <doi:10.1198/106186006X133933>.
This package provides a statistical learning method that tries to find the best set of predictors and interactions between predictors for modeling binary or quantitative response data in a decision tree. Several search algorithms and ensembling techniques are implemented allowing for finetuning the method to the specific problem. Interactions with quantitative covariables can be properly taken into account by fitting local regression models. Moreover, a variable importance measure for assessing marginal and interaction effects is provided. Implements the procedures proposed by Lau et al. (2024, <doi:10.1007/s10994-023-06488-6>).
Datasets, constants, conversion factors, and utilities for MArine', Riverine', Estuarine', LAcustrine and Coastal science. The package contains among others: (1) chemical and physical constants and datasets, e.g. atomic weights, gas constants, the earths bathymetry; (2) conversion factors (e.g. gram to mol to liter, barometric units, temperature, salinity); (3) physical functions, e.g. to estimate concentrations of conservative substances, gas transfer and diffusion coefficients, the Coriolis force and gravity; (4) thermophysical properties of the seawater, as from the UNESCO polynomial or from the more recent derivation based on a Gibbs function.
Implementation of a framework for cluster analysis with selection of the final number of clusters and an optional variable selection procedure. The package is designed to integrate the results of multiple imputed datasets while accounting for the uncertainty that the imputations introduce in the final results. In addition, the package can also be used for a cluster analysis of the complete cases of a single dataset. The package also includes specific methods to summarize and plot the results. The methods are described in Basagana et al. (2013) <doi:10.1093/aje/kws289>.