Impute the survival times for censored observations based on their conditional survival distributions derived from the Kaplan-Meier estimator. CondiS
can replace the censored observations with the best approximations from the statistical model, allowing for direct application of machine learning-based methods. When covariates are available, CondiS
is extended by incorporating the covariate information through machine learning-based regression modeling ('CondiS_X
'), which can further improve the imputed survival time.
Individual gene expression patterns are encoded into a series of eigenvector patterns ('WGCNA package). Using the framework of linear model-based differential expression comparisons ('limma package), time-course expression patterns for genes in different conditions are compared and analyzed for significant pattern changes. For reference, see: Greenham K, Sartor RC, Zorich S, Lou P, Mockler TC and McClung
CR. eLife
. 2020 Sep 30;9(4). <doi:10.7554/eLife.58993>
.
This package contains logic for computing sparse principal components via the EESPCA method, which is based on an approximation of the eigenvector/eigenvalue identity. Includes logic to support execution of the TPower and rifle sparse PCA methods, as well as logic to estimate the sparsity parameters used by EESPCA, TPower and rifle via cross-validation to minimize the out-of-sample reconstruction error. H. Robert Frost (2021) <doi:10.1080/10618600.2021.1987254>.
This package implements a simple, likelihood-based estimation of the reproduction number (R0) using a branching process with a Poisson likelihood. This model requires knowledge of the serial interval distribution, and dates of symptom onsets. Infectiousness is determined by weighting R0 by the probability mass function of the serial interval on the corresponding day. It is a simplified version of the model introduced by Cori et al. (2013) <doi:10.1093/aje/kwt133>.
We describe fifteen different splice site sequence encoding schemes that have been used in earlier studies for mapping of splice site sequences into numeric feature vectors. These encoding schemes will also be helpful for transforming other nucleotide sequences into numeric forms, provided they are of equal length. These encoding schemes will help the computational biologist working in the field of classification (binary or multiclass) or prediction involving nucleic acid sequences of equal length.
Fair machine learning regression models which take sensitive attributes into account in model estimation. Currently implementing Komiyama et al. (2018) <http://proceedings.mlr.press/v80/komiyama18a/komiyama18a.pdf>, Zafar et al. (2019) <https://www.jmlr.org/papers/volume20/18-262/18-262.pdf> and my own approach from Scutari, Panero and Proissl (2022) <https://link.springer.com/content/pdf/10.1007/s11222-022-10143-w.pdf> that uses ridge regression to enforce fairness.
Routines that allow the user to run goodness of fit tests based on empirical distribution functions for formal model evaluation in a general likelihood model. In addition, functions are provided to test a sample against Normal or Gamma distributions, validate the normality assumptions in a linear model, and examine the appropriateness of a Gamma distribution in generalized linear models with various link functions. Michael Arthur Stephens (1976) <http://www.jstor.org/stable/2958206>.
Implement a coherent and flexible protocol for animal color tagging. GenTag
provides a simple computational routine with low CPU usage to create color sequences for animal tag. First, a single-color tag sequence is created from an algorithm selected by the user, followed by verification of the combination uniqueness. Three methods to produce color tag sequences are provided. Users can modify the main function core to allow a wide range of applications.
This package provides causal inference with interactive fixed-effect models. It imputes counterfactuals for each treated unit using control group information based on a linear interactive fixed effects model that incorporates unit-specific intercepts interacted with time-varying coefficients. This method generalizes the synthetic control method to the case of multiple treated units and variable treatment periods, and improves efficiency and interpretability. This version supports unbalanced panels and implements the matrix completion method.
Computes the sample probability value (p-value) for the estimated coefficient from a standard genome-wide univariate regression. It computes the exact finite-sample p-value under the assumption that the measured phenotype (the dependent variable in the regression) has a known Bernoulli-normal mixture distribution. Finite-sample genome-wide regression p-values (Gwrpv) with a non-normally distributed phenotype (Gregory Connor and Michael O'Neill, bioRxiv
204727 <doi:10.1101/204727>).
This package provides tools for the development of packages related to General Transit Feed Specification (GTFS) files. Establishes a standard for representing GTFS feeds using R data types. Provides fast and flexible functions to read and write GTFS feeds while sticking to this standard. Defines a basic gtfs class which is meant to be extended by packages that depend on it. And offers utility functions that support checking the structure of GTFS objects.
Semiparametric regression models on the cumulative incidence function for interval-censored competing risks data as described in Bakoyannis, Yu, & Yiannoutsos (2017) /doi10.1002/sim.7350 and the models with missing event types as described in Park, Bakoyannis, Zhang, & Yiannoutsos (2021) \doi10.1093/biostatistics/kxaa052. The proportional subdistribution hazards model (Fine-Gray model), the proportional odds model, and other models that belong to the class of semiparametric generalized odds rate transformation models.
This package provides functions implementing multivariate state space models for purposes of time series analysis and forecasting. The focus of the package is on multivariate models, such as Vector Exponential Smoothing, Vector ETS (Error-Trend-Seasonal model) etc. It currently includes Vector Exponential Smoothing (VES, de Silva et al., 2010, <doi:10.1177/1471082X0901000401>), Vector ETS (Svetunkov et al., 2023, <doi:10.1016/j.ejor.2022.04.040>) and simulation function for VES.
This package provides tools to create an interactive web-based visualization of a topic model that has been fit to a corpus of text data using Latent Dirichlet Allocation (LDA). Given the estimated parameters of the topic model, it computes various summary statistics as input to an interactive visualization built with D3.js that is accessed via a browser. The goal is to help users interpret the topics in their LDA topic model.
This package provides a framework which should improve reproducibility and transparency in data processing. It provides functionality such as automatic meta data creation and management, rudimentary quality management, data caching, work-flow management and data aggregation. * The title is a wish not a promise. By no means we expect this package to deliver everything what is needed to achieve full reproducibility and transparency, but we believe that it supports efforts in this direction.
Probabilistic Regression Trees (PRTree). Functions for fitting and predicting PRTree models with some adaptations to handle missing values. The main calculations are performed in FORTRAN', resulting in highly efficient algorithms. This package's implementation is based on the PRTree methodology described in Alkhoury, S.; Devijver, E.; Clausel, M.; Tami, M.; Gaussier, E.; Oppenheim, G. (2020) - "Smooth And Consistent Probabilistic Regression Trees" <https://proceedings.neurips.cc/paper_files/paper/2020/file/8289889263db4a40463e3f358bb7c7a1-Paper.pdf>.
Helper functions for MASCOTNUM algorithm template, for design of numerical experiments practice: algorithm template parser to support MASCOTNUM specification <https://www.gdr-mascotnum.fr/template.html>, ask & tell decoupling injection (inspired by <https://search.r-project.org/CRAN/refmans/sensitivity/html/decoupling.html>) to use "crimped" algorithms (like uniroot()
, optim()
, ...) from outside R, basic template examples: Brent algorithm for 1 dim root finding and L-BFGS-B from base optim()
.
This package provides a framework for estimating difference-in-differences with unpoolable data, based on Karim, Webb, Austin, and Strumpf (2024) <doi:10.48550/arXiv.2403.15910>
. Supports common or staggered adoption, multiple groups, and the inclusion of covariates. Also computes p-values for the aggregate average treatment effect on the treated via the randomization inference procedure described in MacKinnon
and Webb (2020) <doi:10.1016/j.jeconom.2020.04.024>.
This package provides tools for the analysis of complex survey samples. The provided features include: summary statistics, two-sample tests, rank tests, generalised linear models, cumulative link models, Cox models, loglinear models, and general maximum pseudolikelihood estimation for multistage stratified, cluster-sampled, unequally weighted survey samples; variances by Taylor series linearisation or replicate weights; post-stratification, calibration, and raking; two-phase subsampling designs; graphics; PPS sampling without replacement; principal components, and factor analysis.
Multivariate regression methodologies including classical reduced-rank regression (RRR) studied by Anderson (1951) <doi:10.1214/aoms/1177729580> and Reinsel and Velu (1998) <doi:10.1007/978-1-4757-2853-8>, reduced-rank regression via adaptive nuclear norm penalization proposed by Chen et al. (2013) <doi:10.1093/biomet/ast036> and Mukherjee et al. (2015) <doi:10.1093/biomet/asx080>, robust reduced-rank regression (R4) proposed by She and Chen (2017) <doi:10.1093/biomet/asx032>, generalized/mixed-response reduced-rank regression (mRRR
) proposed by Luo et al. (2018) <doi:10.1016/j.jmva.2018.04.011>, row-sparse reduced-rank regression (SRRR) proposed by Chen and Huang (2012) <doi:10.1080/01621459.2012.734178>, reduced-rank regression with a sparse singular value decomposition (RSSVD) proposed by Chen et al. (2012) <doi:10.1111/j.1467-9868.2011.01002.x> and sparse and orthogonal factor regression (SOFAR) proposed by Uematsu et al. (2019) <doi:10.1109/TIT.2019.2909889>.
Infrastructure for estimating probabilistic distributional regression models in a Bayesian framework. The distribution parameters may capture location, scale, shape, etc. and every parameter may depend on complex additive terms (fixed, random, smooth, spatial, etc.) similar to a generalized additive model. The conceptual and computational framework is introduced in Umlauf, Klein, Zeileis (2019) <doi:10.1080/10618600.2017.1407325> and the R package in Umlauf, Klein, Simon, Zeileis (2021) <doi:10.18637/jss.v100.i04>.
Prognostic Enrichment is a clinical trial strategy of evaluating an intervention in a patient population with a higher rate of the unwanted event than the broader patient population (R. Temple (2010) <DOI:10.1038/clpt.2010.233>). A higher event rate translates to a lower sample size for the clinical trial, which can have both practical and ethical advantages. This package is a tool to help evaluate biomarkers for prognostic enrichment of clinical trials.
For identifying, estimating, and plotting descriptive multidimensional item response theory models, restricted to 3D and dichotomous or polytomous data that fit the two-parameter logistic model or the graded response model. The method is foremost explorative and centered around the plot function that exposes item characteristics and constructs, represented by vector arrows, located in a three-dimensional interactive latent space. The results can be useful for item-level analysis as well as test development.
This package provides methods for fitting nonstationary Gaussian process models by spatial deformation, as introduced by Sampson and Guttorp (1992) <doi:10.1080/01621459.1992.10475181>, and by dimension expansion, as introduced by Bornn et al. (2012) <doi:10.1080/01621459.2011.646919>. Low-rank thin-plate regression splines, as developed in Wood, S.N. (2003) <doi:10.1111/1467-9868.00374>, are used to either transform co-ordinates or create new latent dimensions.