I tend to repeat the same code chunks over and over again. At first, this was fine for me and I paid little attention to such redundancies. A little later, when I got tired of manually replacing Linux filepaths with the referring Windows versions, and vice versa, I started to stuff some very frequently used work-steps into functions and, even later, into a proper R package. And that's what this package is - a hodgepodge of various R functions meant to simplify (my) everyday-life coding work without, at the same time, being devoted to a particular scope of application.
Joint frailty models have been widely used to study the associations between recurrent events and a survival outcome. However, existing joint frailty models only consider one or a few recurrent events and cannot deal with high-dimensional recurrent events. This package can be used to fit our recently developed penalized joint frailty model that can handle high-dimensional recurrent events. Specifically, an adaptive lasso penalty is imposed on the parameters for the effects of the recurrent events on the survival outcome, which allows for variable selection. Also, our algorithm is computationally efficient, which is based on the Gaussian variational approximation method.
Generates and evaluates D, I, A, Alias, E, T, and G optimal designs. Supports generation and evaluation of blocked and split/split-split/.../N-split plot designs. Includes parametric and Monte Carlo power evaluation functions, and supports calculating power for censored responses. Provides a framework to evaluate power using functions provided in other packages or written by the user. Includes a Shiny graphical user interface that displays the underlying code used to create and evaluate the design to improve ease-of-use and make analyses more reproducible. For details, see Morgan-Wall et al. (2021) <doi:10.18637/jss.v099.i01>.
We introduced a novel ensemble-based explainable machine learning model using Model Confidence Set (MCS) and two stage Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) algorithm. The model combined the predictive capabilities of different machine-learning models and integrates the interpretability of explainability methods. To develop the proposed algorithm, a two-stage Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) framework was employed. The package has been developed using the algorithm of Paul et al. (2023) <doi:10.1007/s40009-023-01218-x> and Yeasin and Paul (2024) <doi:10.1007/s11227-023-05542-3>.
The HMS (Hierarchic Memetic Strategy) is a composite global optimization strategy consisting of a multi-population evolutionary strategy and some auxiliary methods. The HMS makes use of a dynamically-evolving data structure that provides an organization among the component populations. It is a tree with a fixed maximal height and variable internal node degree. Each component population is governed by a particular evolutionary engine. This package provides a simple R implementation with examples of using different genetic algorithms as the population engines. References: J. Sawicki, M. Å oÅ , M. SmoÅ ka, J. Alvarez-Aramberri (2022) <doi:10.1007/s11047-020-09836-w>.
This package provides a framework package aimed to provide standardized computational environment for specialist work via object classes to represent the data coded by samples, taxa and segments (i.e. subpopulations, repeated measures). It supports easy processing of the data along with cross tabulation and relational data tables for samples and taxa. An object of class `mefa is a project specific compendium of the data and can be easily used in further analyses. Methods are provided for extraction, aggregation, conversion, plotting, summary and reporting of `mefa objects. Reports can be generated in plain text or LaTeX
format. Vignette contains worked examples.
Features tools for the network data analysis and community detection. Provides multiple methods for fitting, model selection and goodness-of-fit testing in degree-corrected stochastic blocks models. Most of the computations are fast and scalable for sparse networks, esp. for Poisson versions of the models. Implements the following: Amini, Chen, Bickel and Levina (2013) <doi:10.1214/13-AOS1138> Bickel and Sarkar (2015) <doi:10.1111/rssb.12117> Lei (2016) <doi:10.1214/15-AOS1370> Wang and Bickel (2017) <doi:10.1214/16-AOS1457> Zhang and Amini (2020) <arXiv:2012.15047>
Le and Levina (2022) <doi:10.1214/21-EJS1971>.
Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. qdap is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.
This package provides a terribly-simple data base for numeric time series, written purely in R, so no external database-software is needed. Series are stored in plain-text files (the most-portable and enduring file type) in CSV format. Timestamps are encoded using R's native numeric representation for Date'/'POSIXct', which makes them fast to parse, but keeps them accessible with other software. The package provides tools for saving and updating series in this standardised format, for retrieving and joining data, for summarising files and directories, and for coercing series from and to other data types (such as zoo series).
The Tweedie compound Poisson distribution is a mixture of a degenerate distribution at the origin and a continuous distribution on the positive real line. It has been applied in a wide range of fields in which continuous data with exact zeros regularly arise. The cplm package provides likelihood based and Bayesian procedures for fitting common Tweedie compound Poisson linear models. In particular, models with hierarchical structures or extra zero inflation can be handled. Further, the package implements the Gini index based on an ordered version of the Lorenz curve as a robust model comparison tool involving zero-inflated and highly skewed distributions.
The rema package implements a permutation-based approach for binary meta-analyses of 2x2 tables, founded on conditional logistic regression, that provides more reliable statistical tests when heterogeneity is observed in rare event data (Zabriskie et al. 2021 <doi:10.1002/sim.9142>). To adjust for the effect of heterogeneity, this method conditions on the sufficient statistic of a proxy for the heterogeneity effect as opposed to estimating the heterogeneity variance. While this results in the model not strictly falling under the random-effects framework, it is akin to a random-effects approach in that it assumes differences in variability due to treatment. Further, this method does not rely on large-sample approximations or continuity corrections for rare event data. This method uses the permutational distribution of the test statistic instead of asymptotic approximations for inference. The number of observed events drives the computation complexity for creating this permutational distribution. Accordingly, for this method to be computationally feasible, it should only be applied to meta-analyses with a relatively low number of observed events. To create this permutational distribution, a network algorithm, based on the work of Mehta et al. (1992) <doi:10.2307/1390598> and Corcoran et al. (2001) <doi:10.1111/j.0006-341x.2001.00941.x>, is employed using C++ and integrated into the package.
This package provides utilities to calculate the probabilities of various dice-rolling events, such as the probability of rolling a four-sided die six times and getting a 4, a 3, and either a 1 or 2 among the six rolls (in any order); the probability of rolling two six-sided dice three times and getting a 10 on the first roll, followed by a 4 on the second roll, followed by anything but a 7 on the third roll; or the probabilities of each possible sum of rolling five six-sided dice, dropping the lowest two rolls, and summing the remaining dice.
This package contains methods for fitting Generalized Linear Models (GLMs) and Generalized Additive Models (GAMs). Generalized regression models are common methods for handling data for which assuming Gaussian-distributed errors is not appropriate. For instance, if the response of interest is binary, count, or proportion data, one can instead model the expectation of the response based on an appropriate data-generating distribution. This package provides methods for fitting GLMs and GAMs under Beta regression, Poisson regression, Gamma regression, and Binomial regression (currently GLM only) settings. Models are fit using local scoring algorithms described in Hastie and Tibshirani (1990) <doi:10.1214/ss/1177013604>.
Detailed functionality for working with the univariate and multivariate Generalized Hyperbolic distribution and its special cases (Hyperbolic (hyp), Normal Inverse Gaussian (NIG), Variance Gamma (VG), skewed Student-t and Gaussian distribution). Especially, it contains fitting procedures, an AIC-based model selection routine, and functions for the computation of density, quantile, probability, random variates, expected shortfall and some portfolio optimization and plotting routines as well as the likelihood ratio test. In addition, it contains the Generalized Inverse Gaussian distribution. See Chapter 3 of A. J. McNeil
, R. Frey, and P. Embrechts. Quantitative risk management: Concepts, techniques and tools. Princeton University Press, Princeton (2005).
This package provides functions for fitting and doing predictions with Gaussian process models using Vecchia's (1988) approximation. Package also includes functions for reordering input locations, finding ordered nearest neighbors (with help from FNN package), grouping operations, and conditional simulations. Covariance functions for spatial and spatial-temporal data on Euclidean domains and spheres are provided. The original approximation is due to Vecchia (1988) <http://www.jstor.org/stable/2345768>, and the reordering and grouping methods are from Guinness (2018) <doi:10.1080/00401706.2018.1437476>. Model fitting employs a Fisher scoring algorithm described in Guinness (2019) <doi:10.48550/arXiv.1905.08374>
.
Analysis, imputation, and multiple imputation of count data using covariates. LORI uses a log-linear Poisson model where main row and column effects, as well as effects of known covariates and interaction terms can be fitted. The estimation procedure is based on the convex optimization of the Poisson loss penalized by a Lasso type penalty and a nuclear norm. LORI returns estimates of main effects, covariate effects and interactions, as well as an imputed count table. The package also contains a multiple imputation procedure. The methods are described in Robin, Josse, Moulines and Sardy (2019) <doi:10.1016/j.jmva.2019.04.004>.
An embedded proximal interior point quadratic programming solver, which can solve dense and sparse quadratic programs, described in Schwan, Jiang, Kuhn, and Jones (2023) <doi:10.48550/arXiv.2304.00290>
. Combining an infeasible interior point method with the proximal method of multipliers, the algorithm can handle ill-conditioned convex quadratic programming problems without the need for linear independence of the constraints. The solver is written in header only C++ 14 leveraging the Eigen library for vectorized linear algebra. For small dense problems, vectorized instructions and cache locality can be exploited more efficiently. Allocation free problem updates and re-solves are also provided.
The advanced version of package s2dverification'. It is intended for seasonal to decadal (s2d) climate forecast verification, but it can also be used in other kinds of forecasts or general climate analysis. This package is specially designed for the comparison between the experimental and observational datasets. The functionality of the included functions covers from data retrieval, data post-processing, skill scores against observation, to visualization. Compared to s2dverification', s2dv is more compatible with the package startR
', able to use multiple cores for computation and handle multi-dimensional arrays with a higher flexibility. The CDO version used in development is 1.9.8.
The aim of XINA
is to determine which proteins exhibit similar patterns within and across experimental conditions, since proteins with co-abundance patterns may have common molecular functions. XINA
imports multiple datasets, tags dataset in silico, and combines the data for subsequent subgrouping into multiple clusters. The result is a single output depicting the variation across all conditions. XINA
not only extracts coabundance profiles within and across experiments, but also incorporates protein-protein interaction databases and integrative resources such as Kyoto encyclopedia of genes and genomes (KEGG) to infer interactors and molecular functions, respectively, and produces intuitive graphical outputs.
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in http://doi.org/10.18637/jss.v045.i03. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
This package implements the fast cross-validation via sequential testing (CVST) procedure. CVST is an improved cross-validation procedure which uses non-parametric testing coupled with sequential analysis to determine the best parameter set on linearly increasing subsets of the data. Additionally to the CVST the package contains an implementation of the ordinary k-fold cross-validation with a flexible and powerful set of helper objects and methods to handle the overall model selection process. The implementations of the Cochran's Q test with permutations and the sequential testing framework of Wald are generic and can therefore also be used in other contexts.
Rclone
is a command line program to sync files and directories to and from different cloud storage providers.
Features include:
MD5/SHA1 hashes checked at all times for file integrity
Timestamps preserved on files
Partial syncs supported on a whole file basis
Copy mode to just copy new/changed files
Sync (one way) mode to make a directory identical
Check mode to check for file hash equality
Can sync to and from network, e.g., two different cloud accounts
Optional encryption (Crypt)
Optional cache (Cache)
Optional FUSE mount (rclone mount)
It includes estimating the area under the concentrations versus time curve (AUC) and its standard error for data with Below the Limit of Quantification (BLOQ) observations. Two approaches are implemented: direct estimation using censored maximum likelihood, also by first imputing the BLOQ's using various methods, then compute AUC and its standard error using imputed data. Technical details can found in Barnett, Helen Yvette, Helena Geys, Tom Jacobs, and Thomas Jaki. "Methods for Non-Compartmental Pharmacokinetic Analysis With Observations Below the Limit of Quantification." Statistics in Biopharmaceutical Research (2020): 1-12. (available online: <https://www.tandfonline.com/doi/full/10.1080/19466315.2019.1701546>).
Estimate, assess, test, and study linear, nonlinear, hierarchical and multigroup structural equation models using composite-based approaches and procedures, including estimation techniques such as partial least squares path modeling (PLS-PM) and its derivatives (PLSc, ordPLSc
, robustPLSc
), generalized structured component analysis (GSCA), generalized structured component analysis with uniqueness terms (GSCAm), generalized canonical correlation analysis (GCCA), principal component analysis (PCA), factor score regression (FSR) using sum score, regression or Bartlett scores (including bias correction using Croonâ s approach), as well as several tests and typical postestimation procedures (e.g., verify admissibility of the estimates, assess the model fit, test the model fit etc.).