Addresses the log of zero by developing a new family of estimators called iterated Ordinary Least Squares. This family nests standard approaches such as log-linear and Poisson regressions, offers several computational advantages, and corresponds to the correct way to perform the popular log(Y + 1) transformation. For more details about how to use it, see the notebook at: <https://www.davidbenatia.com/>.
This package provides list-processing utilities inspired by the SRFI-1 list library for Scheme (<https://srfi.schemers.org/srfi-1/srfi-1.html>), including car/cdr family accessors, zip, pairwise, for.each, pair.fold.right and friends. Higher-order helpers that are orthogonal to list processing are deferred to the functional package; this package is freely a mixture of implementation and API.
This package implements the efficient algorithm by Ortmann and Brandes (2017) <doi:10.1007/s41109-017-0027-2> to compute the orbit-aware frequency distribution of induced and non-induced quads, i.e. subgraphs of size four. Given an edge matrix, data frame, or a graph object (e.g., igraph'), the orbit-aware counts are computed respective each of the edges and nodes.
This package provides tools for performing disproportionality analysis using the information component, proportional reporting rate and the reporting odds ratio. The anticipated use is passing data to the da() function, which executes the disproportionality analysis. See Norén et al (2011) <doi:10.1177/0962280211403604> and Montastruc et al (2011) <doi:10.1111/j.1365-2125.2011.04037.x> for further details.
This package provides a wrapper around the generic coordinate transformation software PROJ that transforms coordinates from one coordinate reference system ('CRS') to another. This includes cartographic projections as well as geodetic transformations. The intention is for this package to be used by user-packages such as reproj', and that the older PROJ.4 and version 5 pathways be provided by the proj4 package.
This package provides tooling to group dates by a variety of periods including: yearly, monthly, by second, by week of the month, and more. The groups are defined in such a way that they also represent the distance between dates in terms of the period. This extracts valuable information that can be used in further calculations that rely on a specific temporal spacing between observations.
Cluster ensembles are collections of individual solutions to a given clustering problem which are useful or necessary to consider in a wide range of applications. This R package provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structures for representing partitions and hierarchies, and facilities for computing on them, including methods for measuring proximity and obtaining consensus and secondary clusterings.
Cross-validate one or multiple regression and classification models and get relevant evaluation metrics in a tidy format. Validate the best model on a test set and compare it to a baseline evaluation. Alternatively, evaluate predictions from an external model. Currently supports regression and classification (binary and multiclass). Described in chp. 5 of Jeyaraman, B. P., Olsen, L. R., & Wambugu M. (2019, ISBN: 9781838550134).
Duplicated publication data (pre-processed and formatted) for entity resolution. This data set contains a total of 1879 records. The following variables are included in the data set: id, title, book title, authors, address, date, year, editor, journal, volume, pages, publisher, institution, type, tech, note. The data set has a respective gold data set that provides information on which records match based on id.
Google's Compact Language Detector 3 is a neural network model for language identification and the successor of cld2 (available from CRAN). The algorithm is still experimental and takes a novel approach to language detection with different properties and outcomes. It can be useful to combine this with the Bayesian classifier results from cld2'. See <https://github.com/google/cld3#readme> for more information.
This package provides a robust identification of differential binding sites method for analyzing ChIP-seq (Chromatin Immunoprecipitation Sequencing) comparing two samples that considers an ensemble of finite mixture models combined with a local false discovery rate (fdr) allowing for flexible modeling of data. Methods for Differential Identification using Mixture Ensemble (DIME) is described in: Taslim et al., (2011) <doi:10.1093/bioinformatics/btr165>.
The goal of dndR is to provide a suite of Dungeons & Dragons related functions. This package is meant to be useful both to players and Dungeon Masters (DMs). Some functions apply to many tabletop role-playing games (e.g., dice rolling), but others are focused on Fifth Edition (a.k.a. "5e") and where possible both the 2014 and 2024 versions are supported.
Set of functions for Data Envelopment Analysis, including classical, fuzzy, cross-efficiency, bootstrapping, and Malmquist models. See: Banker, R.; Charnes, A.; Cooper, W.W. (1984). <doi:10.1287/mnsc.30.9.1078>, Charnes, A.; Cooper, W.W.; Rhodes, E. (1978). <doi:10.1016/0377-2217(78)90138-8> and Charnes, A.; Cooper, W.W.; Rhodes, E. (1981). <doi:10.1287/mnsc.27.6.668>.
Regular and non-regular Fractional Factorial 2-level designs can be created. Furthermore, analysis tools for Fractional Factorial designs with 2-level factors are offered (main effects and interaction plots for all factors simultaneously, cube plot for looking at the simultaneous effects of three factors, full or half normal plot, alias structure in a more readable format than with the built-in function alias).
This package provides useful functions which are needed for bioinformatic analysis such as calculating linear principal components from numeric data and Single-nucleotide polymorphism (SNP) dataset, calculating fixation index (Fst) using Hudson method, creating scatter plots in 3 views, handling with PLINK binary file format, detecting rough structures and outliers using unsupervised clustering, and calculating matrix multiplication in the faster way for big data.
Fit multilevel manifest or latent time-series models, including popular Dynamic Structural Equation Models (DSEM). The models can be set up and modified with user-friendly functions and are fit to the data using Stan for Bayesian inference. Path models and formulas for user-defined models can be easily created with functions using knitr'. Asparouhov, Hamaker, & Muthen (2018) <doi:10.1080/10705511.2017.1406803>.
Estimates power, minimum detectable effect size (MDES) and sample size requirements. The context is multilevel randomized experiments with multiple outcomes. The estimation takes into account the use of multiple testing procedures. Development of this package was supported by a grant from the Institute of Education Sciences (R305D170030). For a full package description, including a detailed technical appendix, see <doi:10.18637/jss.v108.i06>.
This package provides a PEP, or Portable Encapsulated Project, is a dataset that subscribes to the PEP structure for organizing metadata. It is written using a simple YAML + CSV format, it is your one-stop solution to metadata management across data analysis environments. This package reads this standardized project configuration structure into R. Described in Sheffield et al. (2021) <doi:10.1093/gigascience/giab077>.
Several functions introduced in Aster et al.'s book on inverse theory. The functions are often translations of MATLAB code developed by the authors to illustrate concepts of inverse theory as applied to geophysics. Generalized inversion, tomographic inversion algorithms (conjugate gradients, ART and SIRT'), non-linear least squares, first and second order Tikhonov regularization, roughness constraints, and procedures for estimating smoothing parameters are included.
This package provides an easy-to-use yet adaptable set of tools to conduct person-center analysis using a two-step clustering procedure. As described in Bergman and El-Khouri (1999) <DOI:10.1002/(SICI)1521-4036(199910)41:6%3C753::AID-BIMJ753%3E3.0.CO;2-K>, hierarchical clustering is performed to determine the initial partition for the subsequent k-means clustering procedure.
An implementation of the "Design Analysis" proposed by Gelman and Carlin (2014) <doi:10.1177/1745691614551642>. It combines the evaluation of Power-Analysis with other inferential-risks as Type-M error (i.e. Magnitude) and Type-S error (i.e. Sign). See also Altoè et al. (2020) <doi:10.3389/fpsyg.2019.02893> and Bertoldo et al. (2020) <doi:10.31234/osf.io/q9f86>.
This package provides convenience utilities for using DuckDB directly over datasets stored in Azure Data Lake Storage Gen2 (ADLS Gen2, abfss://'). Opens connections configured for Azure-backed Delta Lake and Parquet data, registers Azure credentials as DuckDB secrets, and supports optional repository mirrors for restricted networks. Integrates well with DBI for SQL workflows and with dplyr and dbplyr for lazy table queries.
This package implements the basic elements of the multi-model inference paradigm for up to twenty species-area relationship models (SAR), using simple R list-objects and functions, as in Triantis et al. 2012 <DOI:10.1111/j.1365-2699.2011.02652.x>. The package is scalable and users can easily create their own model and data objects. Additional SAR related functions are provided.
Routines to write, simulate, and validate stock-flow consistent (SFC) models. The accounting structure of SFC models are described in Godley and Lavoie (2007, ISBN:978-1-137-08599-3). The algorithms implemented to solve the models (Gauss-Seidel and Broyden) are described in Kinsella and O'Shea (2010) <doi:10.2139/ssrn.1729205> and Peressini and Sullivan (1988, ISBN:0-387-96614-5).