Three distinct methods are implemented for evaluating the sums of arbitrary negative binomial distributions. These methods are: Furman's exact probability mass function (Furman (2007) <doi:10.1016/j.spl.2006.06.007>), saddlepoint approximation, and a method of moments approximation. Functions are provided to calculate the density function, the distribution function and the quantile function of the convolutions in question given said evaluation methods. Functions for generating random deviates from negative binomial convolutions and for directly calculating the mean, variance, skewness, and excess kurtosis of said convolutions are also provided.
The semiparametric accelerated failure time (AFT) model is an attractive alternative to the Cox proportional hazards model. This package provides a suite of functions for fitting one popular rank-based estimator of the semiparametric AFT model, the regularized Gehan estimator. Specifically, we provide functions for cross-validation, prediction, coefficient extraction, and visualizing both trace plots and cross-validation curves. For further details, please see Suder, P. M. and Molstad, A. J., (2022) Scalable algorithms for semiparametric accelerated failure time models in high dimensions, Statistics in Medicine <doi:10.1002/sim.9264>.
Performance analysis workflow that combines the power of the R language (and the tidyverse realm) and many auxiliary tools to provide a consistent, flexible, extensible, fast, and versatile framework for the performance analysis of task-based applications that run on top of the StarPU
runtime (with its MPI (Message Passing Interface) layer for multi-node support). Its goal is to provide a fruitful prototypical environment to conduct performance analysis hypothesis-checking for task-based applications that run on heterogeneous (multi-GPU, multi-core) multi-node HPC (High-performance computing) platforms.
An introduction to several novel predictive variable selection methods for random forest. They are based on various variable importance methods (i.e., averaged variable importance (AVI), and knowledge informed AVI (i.e., KIAVI, and KIAVI2)) and predictive accuracy in stepwise algorithms. For details of the variable selection methods, please see: Li, J., Siwabessy, J., Huang, Z. and Nichol, S. (2019) <doi:10.3390/geosciences9040180>. Li, J., Alvarez, B., Siwabessy, J., Tran, M., Huang, Z., Przeslawski, R., Radke, L., Howard, F., Nichol, S. (2017). <DOI: 10.13140/RG.2.2.27686.22085>.
This package provides tools for researchers to explicitly show that their results comply to rules for statistical disclosure control imposed by research data centers. These tools help in checking descriptive statistics and models and in calculating extreme values that are not individual data. Also included is a simple function to create log files. The methods used here are described in the "Guidelines for the checking of output based on microdata research" by Bond, Brandt, and de Wolf (2015) <https://cros.ec.europa.eu/system/files/2024-02/Output-checking-guidelines.pdf>.
This package provides a nonparametric method to estimate Toeplitz covariance matrices from a sample of n independently and identically distributed p-dimensional vectors with mean zero. The data is preprocessed with the discrete cosine matrix and a variance stabilization transformation to obtain an approximate Gaussian regression setting for the log-spectral density function. Estimates of the spectral density function and the inverse of the covariance matrix are provided as well. Functions for simulating data and a protein data example are included. For details see (Klockmann, Krivobokova; 2023), <arXiv:2303.10018>
.
Warning: r128gain has been deprecated; the owner recommends using rsgain instead. It is kept here only because it's the only tagger I've found that does what I want -_-'
r128gain is a multi platform command line tool to scan your audio files and tag them with loudness metadata (ReplayGain v2 or Opus R128 gain format), to allow playback of several tracks or albums at a similar loudness level. r128gain can also be used as a Python module from other Python projects to scan and/or tag audio files.
Interface package for sala', the spatial network analysis library from the depthmapX
software application. The R parts of the code are based on the rdepthmap package. Allows for the analysis of urban and building-scale networks and provides metrics and methods usually found within the Space Syntax domain. Methods in this package are described by K. Al-Sayed, A. Turner, B. Hillier, S. Iida and A. Penn (2014) "Space Syntax methodology", and also by A. Turner (2004) <https://discovery.ucl.ac.uk/id/eprint/2651> "Depthmap 4: a researcher's handbook".
This package provides a tool for analyzing conjoint experiments using Bayesian Additive Regression Trees ('BART'), a machine learning method developed by Chipman, George and McCulloch
(2010) <doi:10.1214/09-AOAS285>. This tool focuses specifically on estimating, identifying, and visualizing the heterogeneity within marginal component effects, at the observation- and individual-level. It uses a variable importance measure ('VIMP') with delete-d jackknife variance estimation, following Ishwaran and Lu (2019) <doi:10.1002/sim.7803>, to obtain bias-corrected estimates of which variables drive heterogeneity in the predicted individual-level effects.
Tool for the development of multi-linear QSPR/QSAR models (Quantitative structure-property/activity relationship). Theses models are used in chemistry, biology and pharmacy to find a relationship between the structure of a molecule and its property (such as activity, toxicology but also physical properties). The various functions of this package allows: selection of descriptors based of variances, intercorrelation and user expertise; selection of the best multi-linear regression in terms of correlation and robustness; methods of internal validation (Leave-One-Out, Leave-Many-Out, Y-scrambling) and external using test sets.
This package provides tools for motif analysis in multi-level networks. Multi-level networks combine multiple networks in one, e.g. social-ecological networks. Motifs are small configurations of nodes and edges (subgraphs) occurring in networks. motifr can visualize multi-level networks, count multi-level network motifs and compare motif occurrences to baseline models. It also identifies contributions of existing or potential edges to motifs to find critical or missing edges. The package is in many parts an R wrapper for the excellent SESMotifAnalyser
Python package written by Tim Seppelt.
Extends multiverse package (Sarma A., Kale A., Moon M., Taback N., Chevalier F., Hullman J., Kay M., 2021) <doi:10.31219/osf.io/yfbwm>, which allows users perform to create explorable multiverse analysis in R. This extension provides an additional level of abstraction to the multiverse package with the aim of creating user friendly syntax to researchers, educators, and students in statistics. The mverse syntax is designed to allow piping and takes hints from the tidyverse grammar. The package allows users to define and inspect multiverse analysis using familiar syntax in R.
This package provides a flexible framework for power analysis using Monte Carlo simulation for settings in which considerations of the correlations between predictors are important. Users can set up a data generative model that preserves dependence structures among predictors given existing data (continuous, binary, or ordinal). Users can also generate power curves to assess the trade-offs between sample size, effect size, and power of a design. This package includes several statistical models common in environmental mixtures studies. For more details and tutorials, see Nguyen et al. (2022) <arXiv:2209.08036>
.
This package provides a comprehensive, user-friendly package for label-free proteomics data analysis and machine learning-based modeling. Data generated from MaxQuant
can be easily used to conduct differential expression analysis, build predictive models with top protein candidates, and assess model performance. promor includes a suite of tools for quality control, visualization, missing data imputation (Lazar et. al. (2016) <doi:10.1021/acs.jproteome.5b00981>), differential expression analysis (Ritchie et. al. (2015) <doi:10.1093/nar/gkv007>), and machine learning-based modeling (Kuhn (2008) <doi:10.18637/jss.v028.i05>).
Parallel Constraint Satisfaction (PCS) models are an increasingly common class of models in Psychology, with applications to reading and word recognition (McClelland
& Rumelhart, 1981), judgment and decision making (Glöckner & Betsch, 2008; Glöckner, Hilbig, & Jekel, 2014), and several other fields (e.g. Read, Vanman, & Miller, 1997). In each of these fields, they provide a quantitative model of psychological phenomena, with precise predictions regarding choice probabilities, decision times, and often the degree of confidence. This package provides the necessary functions to create and simulate basic Parallel Constraint Satisfaction networks within R.
Probability mass (d), distribution (p), quantile (q), and random number generating (r and rt) functions for the time-varying right-truncated geometric (tvgeom) distribution. Also provided are functions to calculate the first and second central moments of the distribution. The tvgeom distribution is similar to the geometric distribution, but the probability of success is allowed to vary at each time step, and there are a limited number of trials. This distribution is essentially a Markov chain, and it is useful for modeling Markov chain systems with a set number of time steps.
BASiCS is an integrated Bayesian hierarchical model to perform statistical analyses of single-cell RNA sequencing datasets in the context of supervised experiments (where the groups of cells of interest are known a priori. BASiCS performs built-in data normalisation (global scaling) and technical noise quantification (based on spike-in genes). BASiCS provides an intuitive detection criterion for highly (or lowly) variable genes within a single group of cells. Additionally, BASiCS can compare gene expression patterns between two or more pre-specified groups of cells.
This package implements methods for batch correction and integration of scRNA-seq datasets, based on the Seurat anchor-based integration framework. In particular, STACAS is optimized for the integration of heterogeneous datasets with only limited overlap between cell sub-types (e.g. TIL sets of CD8 from tumor with CD8/CD4 T cells from lymphnode), for which the default Seurat alignment methods would tend to over-correct biological differences. The 2.0 version of the package allows the users to incorporate explicit information about cell-types in order to assist the integration process.
Streaming JSON (ndjson) has one JSON record per-line and many modern ndjson files contain large numbers of records. These constructs may not be columnar in nature, but it is often useful to read in these files and "flatten" the structure out to enable working with the data in an R data.frame
-like context. Functions are provided that make it possible to read in plain ndjson files or compressed (gz
) ndjson files and either validate the format of the records or create "flat" data.table
structures from them.
For studying recurrent disease and death with competing risks, comparisons based on the well-known cumulative incidence function can be confounded by different prevalence rates of the competing events. Alternatively, comparisons of the conditional distribution of the survival time given the failure event type are more relevant for investigating the prognosis of different patterns of recurrence disease. This package implements a nonparametric estimator for the conditional cumulative incidence function and a nonparametric conditional bivariate cumulative incidence function for the bivariate gap times proposed in Huang et al. (2016) <doi:10.1111/biom.12494>.
API Client for the Climate Hazards Center CHIRPS and CHIRTS'. The CHIRPS data is a quasi-global (50°S â 50°N) high-resolution (0.05 arc-degrees) rainfall data set, which incorporates satellite imagery and in-situ station data to create gridded rainfall time series for trend analysis and seasonal drought monitoring. CHIRTS is a quasi-global (60°S â 70°N), high-resolution data set of daily maximum and minimum temperatures. For more details on CHIRPS and CHIRTS data please visit its official home page <https://www.chc.ucsb.edu/data>.
DataSHIELD
is an infrastructure and series of R packages that enables the remote and non-disclosive analysis of sensitive research data. This package is the DataSHIELD
interface implementation for Opal', which is the data integration application for biobanks by OBiBa
'. Participant data, once collected from any data source, must be integrated and stored in a central data repository under a uniform model. Opal is such a central repository. It can import, process, validate, query, analyze, report, and export data. Opal is the reference implementation of the DataSHIELD
infrastructure.
Fast and flexible Kalman filtering and smoothing implementation utilizing sequential processing, designed for efficient parameter estimation through maximum likelihood estimation. Sequential processing is a univariate treatment of a multivariate series of observations and can benefit from computational efficiency over traditional Kalman filtering when independence is assumed in the variance of the disturbances of the measurement equation. Sequential processing is described in the textbook of Durbin and Koopman (2001, ISBN:978-0-19-964117-8). FKF.SP was built upon the existing FKF package and is, in general, a faster Kalman filter/smoother.
This package implements the Generalized Method of Wavelet Moments with Exogenous Inputs estimator (GMWMX) presented in Voirol, L., Xu, H., Zhang, Y., Insolia, L., Molinari, R. and Guerrier, S. (2024) <doi:10.48550/arXiv.2409.05160>
. The GMWMX estimator allows to estimate functional and stochastic parameters of linear models with correlated residuals in presence of missing data. The gmwmx2 package provides functions to load and plot Global Navigation Satellite System (GNSS) data from the Nevada Geodetic Laboratory and functions to estimate linear model model with correlated residuals in presence of missing data.