This package Provides a variety of functions for producing simple weighted statistics, such as weighted Pearson's correlations, partial correlations, Chi-Squared statistics, histograms, and t-tests. Also now includes some software for quickly recoding survey data and plotting point estimates from interaction terms in regressions (and multiply imputed regressions). NOTE: Weighted partial correlation calculations pulled to address a bug.
For tree ensembles such as random forests, regularized random forests and gradient boosted trees, this package provides functions for: extracting, measuring and pruning rules; selecting a compact rule set; summarizing rules into a learner; calculating frequent variable interactions; formatting rules in latex code. Reference: Interpreting tree ensembles with inTrees
(Houtao Deng, 2019, <doi:10.1007/s41060-018-0144-8>).
Receiver Operating Characteristic (ROC)-guided survival trees and ensemble algorithms are implemented, providing a unified framework for tree-structured analysis with censored survival outcomes. A time-invariant partition scheme on the survivor population was considered to incorporate time-dependent covariates. Motivated by ideas of randomized tests, generalized time-dependent ROC curves were used to evaluate the performance of survival trees and establish the optimality of the target hazard/survival function. The optimality of the target hazard function motivates us to use a weighted average of the time-dependent area under the curve (AUC) on a set of time points to evaluate the prediction performance of survival trees and to guide splitting and pruning. A detailed description of the implemented methods can be found in Sun et al. (2019) <arXiv:1809.05627>
.
Balancing quasi-experimental field research for effects of covariates is fundamental for drawing causal inference. Propensity Score Matching deals with this issue but current techniques are restricted to binary treatment variables. Moreover, they provide several solutions without providing a comprehensive framework on choosing the best model. The MAGMA R-package addresses these restrictions by offering nearest neighbor matching for two to four groups. It also includes the option to match data of a 2x2 design. In addition, MAGMA includes a framework for evaluating the post-matching balance. The package includes functions for the matching process and matching reporting. We provide a tutorial on MAGMA as vignette. More information on MAGMA can be found in Feuchter, M. D., Urban, J., Scherrer V., Breit, M. L., and Preckel F. (2022) <https://osf.io/p47nc/>.
This package provides functions to support rigorous retrospective data harmonization processing, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the harmonization process, apply specified processing rules to generate harmonized data, diagnose processing errors, and summarize and evaluate harmonized outputs. The main inputs that define the processing are a DataSchema
(list and definitions of harmonized variables to be generated) and Data Processing Elements (processing rules to be applied to generate harmonized variables from study-specific variables). The main outputs of processing are harmonized datasets, associated metadata, and tabular and visual summary reports. As described in Maelstrom Research guidelines for rigorous retrospective data harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).
Generation of natural looking noise has many application within simulation, procedural generation, and art, to name a few. The ambient package provides an interface to the FastNoise
C++ library and allows for efficient generation of perlin, simplex, worley, cubic, value, and white noise with optional perturbation in either 2, 3, or 4 (in case of simplex and white noise) dimensions.
Power and associated functions useful in prospective planning and monitoring of a clinical trial when a recurrent event endpoint is to be assessed by the robust Andersen-Gill model, see Lin, Wei, Yang, and Ying (2010) <doi:10.1111/1467-9868.00259>. The equations developed in Ingel and Jahn-Eimermacher (2014) <doi:10.1002/bimj.201300090> and their consequences are employed.
Bell regression models for count data with overdispersion. The implemented models account for ordinary and zero-inflated regression models under both frequentist and Bayesian approaches. Theoretical details regarding the models implemented in the package can be found in Castellares et al. (2018) <doi:10.1016/j.apm.2017.12.014> and Lemonte et al. (2020) <doi:10.1080/02664763.2019.1636940>.
This package provides a reliable and efficient tool for cleaning univariate time series data. It implements reliable and efficient procedures for automating the process of cleaning univariate time series data. The package provides integration with already developed and deployed tools for missing value imputation and outlier detection. It also provides a way of visualizing large time-series data in different resolutions.
Several functions for working with mixed effects regression models for limited dependent variables. The functions facilitate post-estimation of model predictions or margins, and comparisons between model predictions for assessing or probing moderation. Additional helper functions facilitate model comparisons and implements simulation-based inference for model predictions of alternative-specific outcome models. See also, Melamed and Doan (2024, ISBN: 978-1032509518).
This package contains the support functions for the Time Series Analysis book. We present a function to calculate MSE and MAE for inputs of actual and forecast values. We also have the code for disaggregation as found in Wei and Stram (1990, <doi:10.1111/j.2517-6161.1990.tb01799.x>), and Hodgess and Wei (1996, "Temporal Disaggregation of Time Series").
This package provides a port of the web-based software DAGitty', available at <https://dagitty.net>, for analyzing structural causal models (also known as directed acyclic graphs or DAGs). This package computes covariate adjustment sets for estimating causal effects, enumerates instrumental variables, derives testable implications (d-separation and vanishing tetrads), generates equivalent models, and includes a simple facility for data simulation.
Estimation of the components of an ETAS (Epidemic Type Aftershock Sequence) model for earthquake description. Non-parametric background seismicity can be estimated through FLP (Forward Likelihood Predictive). New version 2.0.0: covariates have been introduced to explain the effects of external factors on the induced seismicity; the parametrization has been changed; Chiodi, Adelfio (2017)<doi:10.18637/jss.v076.i03>.
Books are "Linear Models with R" published 1st Ed. August 2004, 2nd Ed. July 2014, 3rd Ed. February 2025 by CRC press, ISBN 9781439887332, and "Extending the Linear Model with R" published by CRC press in 1st Ed. December 2005 and 2nd Ed. March 2016, ISBN 9781584884248 and "Practical Regression and ANOVA in R" contributed documentation on CRAN (now very dated).
This package provides a ggplot2'-consistent approach to generating 2D displays of volumetric brain imaging data. Display data from multiple NIfTI
images using standard ggplot2 conventions such scales, limits, and themes to control the appearance of displays. The resulting plots are returned as patchwork objects, inheriting from ggplot', allowing for any standard modifications of display aesthetics supported by ggplot2'.
Implementation of Discrete Symmetric Optimal Kernel for estimating count data distributions, as described by T. Senga Kiessé and G. Durrieu (2024) <doi:10.1016/j.spl.2024.110078>.The nonparametric estimator using the discrete symmetric optimal kernel was illustrated on simulated data sets and a real-word data set included in the package, in comparison with two other discrete symmetric kernels.
This package contains (1) event-related brain potential data recorded from 10 participants at electrodes Fz, Cz, Pz, and Oz (0--300 ms) in the context of Antoine Tremblay's PhD
thesis (Tremblay, 2009); (2) ERP amplitudes at electrode Fz restricted to the 100 to 175 millisecond time window; and (3) plotting data generated from a linear mixed-effects model.
This package provides a collection of miscellaneous methods to simplify various tasks, including plotting, data.frame and matrix transformations, environment functions, regular expression methods, and string and logical operations, as well as numerical and statistical tools. Most of the methods are simple but useful wrappers of common base R functions, which extend S3 generics or provide default values for important parameters.
This package provides tools for Natural Language Processing in French and texts from Marcel Proust's collection "A La Recherche Du Temps Perdu". The novels contained in this collection are "Du cote de chez Swann ", "A l'ombre des jeunes filles en fleurs","Le Cote de Guermantes", "Sodome et Gomorrhe I et II", "La Prisonniere", "Albertine disparue", and "Le Temps retrouve".
This package implements the methods proposed by Olley, G.S. and Pakes, A. (1996) <doi:10.2307/2171831>, Levinsohn, J. and Petrin, A. (2003) <doi:10.1111/1467-937X.00246>, Ackerberg, D.A. and Caves, K. and Frazer, G. (2015) <doi:10.3982/ECTA13408> and Wooldridge, J.M. (2009) <doi:10.1016/j.econlet.2009.04.026> for structural productivity estimation .
Uses the fst package to store genotype probabilities on disk for the qtl2 package. These genotype probabilities are a central data object for mapping quantitative trait loci (QTL), but they can be quite large. The facilities in this package enable the genotype probabilities to be stored on disk, leading to reduced memory usage with only a modest increase in computation time.
This package provides methods for the analysis of signed networks. This includes several measures for structural balance as introduced by Cartwright and Harary (1956) <doi:10.1037/h0046049>, blockmodeling algorithms from Doreian (2008) <doi:10.1016/j.socnet.2008.03.005>, various centrality indices, and projections of signed two-mode networks introduced by Schoch (2020) <doi:10.1080/0022250X.2019.1711376>.
An implementation of a boosted Tweedie compound Poisson model proposed by Yang, Y., Qian, W. and Zou, H. (2018) <doi:10.1080/07350015.2016.1200981>. It is capable of fitting a flexible nonlinear Tweedie compound Poisson model (or a gamma model) and capturing high-order interactions among predictors. This package is based on the gbm package originally developed by Greg Ridgeway.
The goal of MineICA
is to perform Independent Component Analysis (ICA) on multiple transcriptome datasets, integrating additional data (e.g molecular, clinical and pathological). This Integrative ICA helps the biological interpretation of the components by studying their association with variables (e.g sample annotations) and gene sets, and enables the comparison of components from different datasets using correlation-based graph.