Assists researchers in choosing Key Opinion Leaders (KOLs) in a network to help disseminate or encourage adoption of an innovation by other network members. Potential KOL teams are evaluated using the ABCDE framework (Neal et al., 2025 <doi:10.31219/osf.io/3vxy9_v1>). This framework which considers: (1) the team members Availability, (2) the Breadth of the team's network coverage, (3) the Cost of recruiting a team of a given size, and (4) the Diversity of the team's members, (5) which are pooled into a single Evaluation score.
An interface to Neptune. A metadata store for MLOps, built for teams that run a lot of experiments. It gives you a single place to log, store, display, organize, compare, and query all your model-building metadata. Neptune is used for: â ¢ Experiment tracking: Log, display, organize, and compare ML experiments in a single place. â ¢ Model registry: Version, store, manage, and query trained models, and model building metadata. â ¢ Monitoring ML runs live: Record and monitor model training, evaluation, or production runs live For more information see <https://neptune.ai/>.
The functions allow for the numerical evaluation of some commonly used entropy measures, such as Shannon entropy, Rényi entropy, Havrda and Charvat entropy, and Arimoto entropy, at selected parametric values from several well-known and widely used probability distributions. Moreover, the functions also compute the relative loss of these entropies using the truncated distributions. Related works include: Awad, A. M., & Alawneh, A. J. (1987). Application of entropy to a life-time model. IMA Journal of Mathematical Control and Information, 4(2), 143-148. <doi:10.1093/imamci/4.2.143>.
It is a single cell active pathway analysis tool based on the graph neural network (F. Scarselli (2009) <doi:10.1109/TNN.2008.2005605>; Thomas N. Kipf (2017) <arXiv:1609.02907v4>) to construct the gene-cell association network, infer pathway activity scores from different single cell modalities data, integrate multiple modality data on the same cells into one pathway activity score matrix, identify cell phenotype activated gene modules and parse association networks of gene modules under multiple cell phenotype. In addition, abundant visualization programs are provided to display the results.
This package implements the Welch-Satterthwaite approximation for differences of non-standardized t-distributed random variables in both univariate and multivariate settings. The package provides methods for computing effective degrees of freedom and scale parameters, as well as distribution functions for the approximated difference distribution. The methodology extends the classical Welch-Satterthwaite framework from variance combinations to t-distribution differences through careful moment matching. Methods build on the classical Welch-Satterthwaite approach described in Welch (1947) <doi:10.1093/biomet/34.1-2.28> and Satterthwaite (1946) <doi:10.2307/3002019>.
This package provides a collection of functions designed for analyzing deconvolution of the bulk sample(s) using an atlas of reference omic signature profiles and a user-selected model. Users are given the option to create or extend a reference atlas and,also simulate the desired size of the bulk signature profile of the reference cell types. The package includes the cell-type-specific methylation atlas and, Illumina Epic B5 probe ids that can be used in deconvolution. Additionally, we included BSmeth2Probe, to make mapping WGBS data to their probe IDs easier.
This package implements importance sampling from the truncated multivariate normal using the Geweke-Hajivassiliou-Keane (GHK) simulator. Unlike Gibbs sampling which can get stuck in one truncation sub-region depending on initial values, this package allows truncation based on disjoint regions that are created by truncation of absolute values. The GHK algorithm uses simple Cholesky transformation followed by recursive simulation of univariate truncated normals hence there are also no convergence issues. Importance sample is returned along with sampling weights, based on which, one can calculate integrals over truncated regions for multivariate normals.
Computes conditional multivariate t probabilities, random deviates, and densities. It can also be used to create missing values at random in a dataset, resulting in a missing at random (MAR) mechanism. Inbuilt in the package are the Expectation-Maximization (EM), Monte Carlo EM, and Stochastic EM algorithms for imputation of missing values in datasets assuming the multivariate t distribution. See Kinyanjui, Tamba, Orawo, and Okenye (2020)<doi:10.3233/mas-200493>, and Kinyanjui, Tamba, and Okenye(2021)<http://www.ceser.in/ceserp/index.php/ijamas/article/view/6726/0> for more details.
This package provides a set of functions to quantify the relationship between development rate and temperature and to build phenological models. The package comprises a set of models and estimated parameters borrowed from a literature review in ectotherms. The methods and literature review are described in Rebaudo et al. (2018) <doi:10.1111/2041-210X.12935>, Rebaudo and Rabhi (2018) <doi:10.1111/eea.12693>, and Regnier et al. (2021) <doi:10.1093/ee/nvab115>. An example can be found in Rebaudo et al. (2017) <doi:10.1007/s13355-017-0480-5>.
Automatic generation of finite state machine models of dynamic decision-making that both have strong predictive power and are interpretable in human terms. We use an efficient model representation and a genetic algorithm-based estimation process to generate simple deterministic approximations that explain most of the structure of complex stochastic processes. We have applied the software to empirical data, and demonstrated it's ability to recover known data-generating processes by simulating data with agent-based models and correctly deriving the underlying decision models for multiple agent models and degrees of stochasticity.
The Delphi Epidata API provides real-time access to epidemiological surveillance data for influenza, COVID-19', and other diseases for the USA at various geographical resolutions, both from official government sources such as the Center for Disease Control (CDC) and Google Trends and private partners such as Facebook and Change Healthcare'. It is built and maintained by the Carnegie Mellon University Delphi research group. To cite this API: David C. Farrow, Logan C. Brooks, Aaron Rumack', Ryan J. Tibshirani', Roni Rosenfeld (2015). Delphi Epidata API. <https://github.com/cmu-delphi/delphi-epidata>.
This package provides a comprehensive toolkit for single-cell annotation with the CellMarker2.0 database (see Xia Li, Peng Wang, Yunpeng Zhang (2023) <doi: 10.1093/nar/gkac947>). Streamlines biological label assignment in single-cell RNA-seq data and facilitates transcriptomic analysis, including preparation of TCGA<https://portal.gdc.cancer.gov/> and GEO<https://www.ncbi.nlm.nih.gov/geo/> datasets, differential expression analysis and visualization of enrichment analysis results. Additional utility functions support various bioinformatics workflows. See Wei Cui (2024) <doi: 10.1101/2024.09.14.609619> for more details.
This package implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.
This package provides a statistical learning method that tries to find the best set of predictors and interactions between predictors for modeling binary or quantitative response data in a decision tree. Several search algorithms and ensembling techniques are implemented allowing for finetuning the method to the specific problem. Interactions with quantitative covariables can be properly taken into account by fitting local regression models. Moreover, a variable importance measure for assessing marginal and interaction effects is provided. Implements the procedures proposed by Lau et al. (2024, <doi:10.1007/s10994-023-06488-6>).
Fast binning of multiple variables using parallel processing. A summary of all the variables binned is generated which provides the information value, entropy, an indicator of whether the variable follows a monotonic trend or not, etc. It supports rebinning of variables to force a monotonic trend as well as manual binning based on pre specified cuts. The cut points of the bins are based on conditional inference trees as implemented in the partykit package. The conditional inference framework is described by Hothorn T, Hornik K, Zeileis A (2006) <doi:10.1198/106186006X133933>.
Calculates mean cumulative count (MCC) to estimate the expected cumulative number of recurrent events per person over time in the presence of competing risks and censoring. Implements both the Dong-Yasui equation method and sum of cumulative incidence method described in Dong, et al. (2015) <doi:10.1093/aje/kwu289>. Supports inverse probability weighting for causal inference as outlined in Gaber, et al. (2023) <doi:10.1093/aje/kwad031>. Provides S3 methods for printing, summarizing, plotting, and extracting results. Handles grouped analyses and integrates with ggplot2 <https://ggplot2.tidyverse.org/> for visualization.
This package provides access to teaching materials for various statistics courses, including R and Python programs, Shiny apps, data, and PDF/HTML documents. These materials are stored on the Internet as a ZIP file (e.g., in a GitHub repository) and can be downloaded and displayed or run locally. The content of the ZIP file is temporarily or permanently stored. By default, the package uses the GitHub repository sigbertklinke/mmstat4.data. Additionally, the package includes association_measures.R from the archived package ryouready by Mark Heckman and some auxiliary functions.
Datasets, constants, conversion factors, and utilities for MArine', Riverine', Estuarine', LAcustrine and Coastal science. The package contains among others: (1) chemical and physical constants and datasets, e.g. atomic weights, gas constants, the earths bathymetry; (2) conversion factors (e.g. gram to mol to liter, barometric units, temperature, salinity); (3) physical functions, e.g. to estimate concentrations of conservative substances, gas transfer and diffusion coefficients, the Coriolis force and gravity; (4) thermophysical properties of the seawater, as from the UNESCO polynomial or from the more recent derivation based on a Gibbs function.
Advanced methods for a valuable quantitative environmental risk assessment using Bayesian inference of several type of toxicological data. binary (e.g., survival, mobility), count (e.g., reproduction) and continuous (e.g., growth as length, weight). Estimation procedures can be used without a deep knowledge of their underlying probabilistic model or inference methods. Rather, they were designed to behave as well as possible without requiring a user to provide values for some obscure parameters. That said, models can also be used as a first step to tailor new models for more specific situations.
Implementation of a framework for cluster analysis with selection of the final number of clusters and an optional variable selection procedure. The package is designed to integrate the results of multiple imputed datasets while accounting for the uncertainty that the imputations introduce in the final results. In addition, the package can also be used for a cluster analysis of the complete cases of a single dataset. The package also includes specific methods to summarize and plot the results. The methods are described in Basagana et al. (2013) <doi:10.1093/aje/kws289>.
This package provides functions aiming to facilitate the analysis of the structure of animal acoustic signals in R'. warbleR makes use of the basic sound analysis tools from the packages tuneR and seewave', and offers new tools for exploring and quantifying acoustic signal structure. The package allows to organize and manipulate multiple sound files, create spectrograms of complete recordings or individual signals in different formats, run several measures of acoustic structure, and characterize different structural levels in acoustic signals (Araya-Salas et al 2016 <doi:10.1111/2041-210X.12624>).
MEM, Marker Enrichment Modeling, automatically generates and displays quantitative labels for cell populations that have been identified from single-cell data. The input for MEM is a dataset that has pre-clustered or pre-gated populations with cells in rows and features in columns. Labels convey a list of measured features and the features levels of relative enrichment on each population. MEM can be applied to a wide variety of data types and can compare between MEM labels from flow cytometry, mass cytometry, single cell RNA-seq, and spectral flow cytometry using RMSD.
Contrast trees represent a new approach for assessing the accuracy of many types of machine learning estimates that are not amenable to standard (cross) validation methods; see "Contrast trees and distribution boosting", Jerome H. Friedman (2020) <doi:10.1073/pnas.1921562117>. In situations where inaccuracies are detected, boosted contrast trees can often improve performance. Functions are provided to to build such trees in addition to a special case, distribution boosting, an assumption free method for estimating the full probability distribution of an outcome variable given any set of joint input predictor variable values.
This package provides tools for estimating the Remaining Useful Life (RUL) of degrading systems using linear mixed-effects models and creating a health index. It supports both univariate and multivariate degradation signals. For multivariate inputs, the signals are merged into a univariate health index prior to modeling. Linear and exponential degradation trajectories are supported (the latter using a log transformation). Remaining Useful Life (RUL) distributions are estimated using Bayesian updating for new units, enabling on-site predictive maintenance. Based on the methodology of Liu and Huang (2016) <doi:10.1109/TASE.2014.2349733>.