Joint frailty models have been widely used to study the associations between recurrent events and a survival outcome. However, existing joint frailty models only consider one or a few recurrent events and cannot deal with high-dimensional recurrent events. This package can be used to fit our recently developed penalized joint frailty model that can handle high-dimensional recurrent events. Specifically, an adaptive lasso penalty is imposed on the parameters for the effects of the recurrent events on the survival outcome, which allows for variable selection. Also, our algorithm is computationally efficient, which is based on the Gaussian variational approximation method.
Generates and evaluates D, I, A, Alias, E, T, and G optimal designs. Supports generation and evaluation of blocked and split/split-split/.../N-split plot designs. Includes parametric and Monte Carlo power evaluation functions, and supports calculating power for censored responses. Provides a framework to evaluate power using functions provided in other packages or written by the user. Includes a Shiny graphical user interface that displays the underlying code used to create and evaluate the design to improve ease-of-use and make analyses more reproducible. For details, see Morgan-Wall et al. (2021) <doi:10.18637/jss.v099.i01>.
The rema package implements a permutation-based approach for binary meta-analyses of 2x2 tables, founded on conditional logistic regression, that provides more reliable statistical tests when heterogeneity is observed in rare event data (Zabriskie et al. 2021 <doi:10.1002/sim.9142>). To adjust for the effect of heterogeneity, this method conditions on the sufficient statistic of a proxy for the heterogeneity effect as opposed to estimating the heterogeneity variance. While this results in the model not strictly falling under the random-effects framework, it is akin to a random-effects approach in that it assumes differences in variability due to treatment. Further, this method does not rely on large-sample approximations or continuity corrections for rare event data. This method uses the permutational distribution of the test statistic instead of asymptotic approximations for inference. The number of observed events drives the computation complexity for creating this permutational distribution. Accordingly, for this method to be computationally feasible, it should only be applied to meta-analyses with a relatively low number of observed events. To create this permutational distribution, a network algorithm, based on the work of Mehta et al. (1992) <doi:10.2307/1390598> and Corcoran et al. (2001) <doi:10.1111/j.0006-341x.2001.00941.x>, is employed using C++ and integrated into the package.
The aim of XINA is to determine which proteins exhibit similar patterns within and across experimental conditions, since proteins with co-abundance patterns may have common molecular functions. XINA imports multiple datasets, tags dataset in silico, and combines the data for subsequent subgrouping into multiple clusters. The result is a single output depicting the variation across all conditions. XINA not only extracts coabundance profiles within and across experiments, but also incorporates protein-protein interaction databases and integrative resources such as Kyoto encyclopedia of genes and genomes (KEGG) to infer interactors and molecular functions, respectively, and produces intuitive graphical outputs.
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in http://doi.org/10.18637/jss.v045.i03. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
This package implements the fast cross-validation via sequential testing (CVST) procedure. CVST is an improved cross-validation procedure which uses non-parametric testing coupled with sequential analysis to determine the best parameter set on linearly increasing subsets of the data. Additionally to the CVST the package contains an implementation of the ordinary k-fold cross-validation with a flexible and powerful set of helper objects and methods to handle the overall model selection process. The implementations of the Cochran's Q test with permutations and the sequential testing framework of Wald are generic and can therefore also be used in other contexts.
Rclone is a command line program to sync files and directories to and from different cloud storage providers.
Features include:
MD5/SHA1 hashes checked at all times for file integrity
Timestamps preserved on files
Partial syncs supported on a whole file basis
Copy mode to just copy new/changed files
Sync (one way) mode to make a directory identical
Check mode to check for file hash equality
Can sync to and from network, e.g., two different cloud accounts
Optional encryption (Crypt)
Optional cache (Cache)
Optional FUSE mount (rclone mount)
Multi-binary response models are a class of models that allow for the estimation of multiple binary outcomes simultaneously. This package provides functions to estimate and simulate these models using the Discrete Exponential-Family Models [DEFM] framework. In it, we implement the models described in Vega Yon, Valente, and Pugh (2023) <doi:10.48550/arXiv.2211.00627>. DEFMs include Exponential-Family Random Graph Models [ERGMs], which characterize graphs using sufficient statistics, which is also the core of DEFMs. Using sufficient statistics, we can describe the data through meaningful motifs, for example, transitions between different states, joint distribution of the outcomes, etc.
We introduced a novel ensemble-based explainable machine learning model using Model Confidence Set (MCS) and two stage Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) algorithm. The model combined the predictive capabilities of different machine-learning models and integrates the interpretability of explainability methods. To develop the proposed algorithm, a two-stage Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) framework was employed. The package has been developed using the algorithm of Paul et al. (2023) <doi:10.1007/s40009-023-01218-x> and Yeasin and Paul (2024) <doi:10.1007/s11227-023-05542-3>.
The HMS (Hierarchic Memetic Strategy) is a composite global optimization strategy consisting of a multi-population evolutionary strategy and some auxiliary methods. The HMS makes use of a dynamically-evolving data structure that provides an organization among the component populations. It is a tree with a fixed maximal height and variable internal node degree. Each component population is governed by a particular evolutionary engine. This package provides a simple R implementation with examples of using different genetic algorithms as the population engines. References: J. Sawicki, M. Å oÅ , M. SmoÅ ka, J. Alvarez-Aramberri (2022) <doi:10.1007/s11047-020-09836-w>.
This package provides a framework package aimed to provide standardized computational environment for specialist work via object classes to represent the data coded by samples, taxa and segments (i.e. subpopulations, repeated measures). It supports easy processing of the data along with cross tabulation and relational data tables for samples and taxa. An object of class `mefa is a project specific compendium of the data and can be easily used in further analyses. Methods are provided for extraction, aggregation, conversion, plotting, summary and reporting of `mefa objects. Reports can be generated in plain text or LaTeX format. Vignette contains worked examples.
Features tools for the network data analysis and community detection. Provides multiple methods for fitting, model selection and goodness-of-fit testing in degree-corrected stochastic blocks models. Most of the computations are fast and scalable for sparse networks, esp. for Poisson versions of the models. Implements the following: Amini, Chen, Bickel and Levina (2013) <doi:10.1214/13-AOS1138> Bickel and Sarkar (2015) <doi:10.1111/rssb.12117> Lei (2016) <doi:10.1214/15-AOS1370> Wang and Bickel (2017) <doi:10.1214/16-AOS1457> Zhang and Amini (2020) <arXiv:2012.15047> Le and Levina (2022) <doi:10.1214/21-EJS1971>.
This package implements Penalized Fast Causal Inference (PFCI), a two-stage causal structure learning procedure for high-dimensional settings with potential latent variables and selection bias. In the first stage, neighborhood selection via the Lasso constructs a sparse undirected skeleton. In the second stage, the Fast Causal Inference (FCI) algorithm orients edges on this reduced graph, producing a Partial Ancestral Graph (PAG) that accounts for latent confounders. The method is consistent under sparsity assumptions and substantially faster than standard FCI and RFCI in high dimensions. See Pal, Ghosh, and Yang (2025) <doi:10.48550/arXiv.2507.00173> for the underlying theory.
Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. qdap is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.
This package provides a terribly-simple data base for numeric time series, written purely in R, so no external database-software is needed. Series are stored in plain-text files (the most-portable and enduring file type) in CSV format. Timestamps are encoded using R's native numeric representation for Date'/'POSIXct', which makes them fast to parse, but keeps them accessible with other software. The package provides tools for saving and updating series in this standardised format, for retrieving and joining data, for summarising files and directories, and for coercing series from and to other data types (such as zoo series).
Create interactive maps that can keep up with complex visualisations and large datasets, with this useful interface to the MapLibre GL JS (<https://maplibre.org/maplibre-gl-js/docs/>) library. Users can create maps directly in the console, or as an HTML widget within Shiny web applications, and render spatial data quickly with many customisable options (clusters, custom icons, map layers, and backgrounds). The goal of the package is to make it easier to interpret and explore large spatial datasets within the context of a Shiny dashboard, without having long loading times waiting for a map to update with new data.
ASEB is an R package to predict lysine sites that can be acetylated by a specific KAT (K-acetyl-transferases) family. Lysine acetylation is a well-studied posttranslational modification on kinds of proteins. About four thousand lysine acetylation sites and over 20 lysine KATs have been identified. However, which KAT is responsible for a given protein or lysine site acetylation is mostly unknown. In this package, we use a GSEA-like (Gene Set Enrichment Analysis) method to make predictions. GSEA method was developed and successfully used to detect coordinated expression changes and find the putative functions of the long non-coding RNAs.
Zero-variance control variates (ZV-CV) is a post-processing method to reduce the variance of Monte Carlo estimators of expectations using the derivatives of the log target. Once the derivatives are available, the only additional computational effort is in solving a linear regression problem. This method has been extended to higher dimensions using regularisation. This package can be used to easily perform ZV-CV or regularised ZV-CV when a set of samples, derivatives and function evaluations are available. Additional functions for applying ZV-CV to two estimators for the normalising constant of the posterior distribution in Bayesian statistics are also supplied.
This package provides utilities to calculate the probabilities of various dice-rolling events, such as the probability of rolling a four-sided die six times and getting a 4, a 3, and either a 1 or 2 among the six rolls (in any order); the probability of rolling two six-sided dice three times and getting a 10 on the first roll, followed by a 4 on the second roll, followed by anything but a 7 on the third roll; or the probabilities of each possible sum of rolling five six-sided dice, dropping the lowest two rolls, and summing the remaining dice.
This package provides functions for fitting and doing predictions with Gaussian process models using Vecchia's (1988) approximation. Package also includes functions for reordering input locations, finding ordered nearest neighbors (with help from FNN package), grouping operations, and conditional simulations. Covariance functions for spatial and spatial-temporal data on Euclidean domains and spheres are provided. The original approximation is due to Vecchia (1988) <http://www.jstor.org/stable/2345768>, and the reordering and grouping methods are from Guinness (2018) <doi:10.1080/00401706.2018.1437476>. Model fitting employs a Fisher scoring algorithm described in Guinness (2019) <doi:10.48550/arXiv.1905.08374>.
This package contains methods for fitting Generalized Linear Models (GLMs) and Generalized Additive Models (GAMs). Generalized regression models are common methods for handling data for which assuming Gaussian-distributed errors is not appropriate. For instance, if the response of interest is binary, count, or proportion data, one can instead model the expectation of the response based on an appropriate data-generating distribution. This package provides methods for fitting GLMs and GAMs under Beta regression, Poisson regression, Gamma regression, and Binomial regression (currently GLM only) settings. Models are fit using local scoring algorithms described in Hastie and Tibshirani (1990) <doi:10.1214/ss/1177013604>.
Detailed functionality for working with the univariate and multivariate Generalized Hyperbolic distribution and its special cases (Hyperbolic (hyp), Normal Inverse Gaussian (NIG), Variance Gamma (VG), skewed Student-t and Gaussian distribution). Especially, it contains fitting procedures, an AIC-based model selection routine, and functions for the computation of density, quantile, probability, random variates, expected shortfall and some portfolio optimization and plotting routines as well as the likelihood ratio test. In addition, it contains the Generalized Inverse Gaussian distribution. See Chapter 3 of A. J. McNeil, R. Frey, and P. Embrechts. Quantitative risk management: Concepts, techniques and tools. Princeton University Press, Princeton (2005).
Analysis, imputation, and multiple imputation of count data using covariates. LORI uses a log-linear Poisson model where main row and column effects, as well as effects of known covariates and interaction terms can be fitted. The estimation procedure is based on the convex optimization of the Poisson loss penalized by a Lasso type penalty and a nuclear norm. LORI returns estimates of main effects, covariate effects and interactions, as well as an imputed count table. The package also contains a multiple imputation procedure. The methods are described in Robin, Josse, Moulines and Sardy (2019) <doi:10.1016/j.jmva.2019.04.004>.
To account for non-stationary multivariate data, this package implements the framework including copula and marginal distributions. In addition to modeling and parameter estimations, it allows the computation and visualization of multivariate quantile curves for given events. This package is useful for a variety of disciplines such as finance, climatology and particularly for hydrological applications, where dependence structures and marginal parameters may vary over time. This framework, based on Chebana & Ouarda (2021) <doi:10.1016/j.jhydrol.2020.125907>, integrates both multivariate and non-stationary aspects to be more accurate (e.g. for risk assessment) and more realistic (e.g. considering climate changes).