Duplicated publication data (pre-processed and formatted) for entity resolution. This data set contains a total of 1879 records. The following variables are included in the data set: id, title, book title, authors, address, date, year, editor, journal, volume, pages, publisher, institution, type, tech, note. The data set has a respective gold data set that provides information on which records match based on id.
Google's Compact Language Detector 3 is a neural network model for language identification and the successor of cld2 (available from CRAN). The algorithm is still experimental and takes a novel approach to language detection with different properties and outcomes. It can be useful to combine this with the Bayesian classifier results from cld2'. See <https://github.com/google/cld3#readme> for more information.
The goal of dndR is to provide a suite of Dungeons & Dragons related functions. This package is meant to be useful both to players and Dungeon Masters (DMs). Some functions apply to many tabletop role-playing games (e.g., dice rolling), but others are focused on Fifth Edition (a.k.a. "5e") and where possible both the 2014 and 2024 versions are supported.
Set of functions for Data Envelopment Analysis, including classical, fuzzy, cross-efficiency, bootstrapping, and Malmquist models. See: Banker, R.; Charnes, A.; Cooper, W.W. (1984). <doi:10.1287/mnsc.30.9.1078>, Charnes, A.; Cooper, W.W.; Rhodes, E. (1978). <doi:10.1016/0377-2217(78)90138-8> and Charnes, A.; Cooper, W.W.; Rhodes, E. (1981). <doi:10.1287/mnsc.27.6.668>.
This package provides a robust identification of differential binding sites method for analyzing ChIP-seq (Chromatin Immunoprecipitation Sequencing) comparing two samples that considers an ensemble of finite mixture models combined with a local false discovery rate (fdr) allowing for flexible modeling of data. Methods for Differential Identification using Mixture Ensemble (DIME) is described in: Taslim et al., (2011) <doi:10.1093/bioinformatics/btr165>.
Regular and non-regular Fractional Factorial 2-level designs can be created. Furthermore, analysis tools for Fractional Factorial designs with 2-level factors are offered (main effects and interaction plots for all factors simultaneously, cube plot for looking at the simultaneous effects of three factors, full or half normal plot, alias structure in a more readable format than with the built-in function alias).
This package provides useful functions which are needed for bioinformatic analysis such as calculating linear principal components from numeric data and Single-nucleotide polymorphism (SNP) dataset, calculating fixation index (Fst) using Hudson method, creating scatter plots in 3 views, handling with PLINK binary file format, detecting rough structures and outliers using unsupervised clustering, and calculating matrix multiplication in the faster way for big data.
Fit multilevel manifest or latent time-series models, including popular Dynamic Structural Equation Models (DSEM). The models can be set up and modified with user-friendly functions and are fit to the data using Stan for Bayesian inference. Path models and formulas for user-defined models can be easily created with functions using knitr'. Asparouhov, Hamaker, & Muthen (2018) <doi:10.1080/10705511.2017.1406803>.
Implementation of various statistical models for multivariate event history data <doi:10.1007/s10985-013-9244-x>. Including multivariate cumulative incidence models <doi:10.1002/sim.6016>, and bivariate random effects probit models (Liability models) <doi:10.1016/j.csda.2015.01.014>. Modern methods for survival analysis, including regression modelling (Cox, Fine-Gray, Ghosh-Lin, Binomial regression) with fast computation of influence functions.
This package provides an easy-to-use yet adaptable set of tools to conduct person-center analysis using a two-step clustering procedure. As described in Bergman and El-Khouri (1999) <DOI:10.1002/(SICI)1521-4036(199910)41:6%3C753::AID-BIMJ753%3E3.0.CO;2-K>, hierarchical clustering is performed to determine the initial partition for the subsequent k-means clustering procedure.
Several functions introduced in Aster et al.'s book on inverse theory. The functions are often translations of MATLAB code developed by the authors to illustrate concepts of inverse theory as applied to geophysics. Generalized inversion, tomographic inversion algorithms (conjugate gradients, ART and SIRT'), non-linear least squares, first and second order Tikhonov regularization, roughness constraints, and procedures for estimating smoothing parameters are included.
This package provides a PEP, or Portable Encapsulated Project, is a dataset that subscribes to the PEP structure for organizing metadata. It is written using a simple YAML + CSV format, it is your one-stop solution to metadata management across data analysis environments. This package reads this standardized project configuration structure into R. Described in Sheffield et al. (2021) <doi:10.1093/gigascience/giab077>.
Estimates power, minimum detectable effect size (MDES) and sample size requirements. The context is multilevel randomized experiments with multiple outcomes. The estimation takes into account the use of multiple testing procedures. Development of this package was supported by a grant from the Institute of Education Sciences (R305D170030). For a full package description, including a detailed technical appendix, see <doi:10.18637/jss.v108.i06>.
An implementation of the "Design Analysis" proposed by Gelman and Carlin (2014) <doi:10.1177/1745691614551642>. It combines the evaluation of Power-Analysis with other inferential-risks as Type-M error (i.e. Magnitude) and Type-S error (i.e. Sign). See also Altoè et al. (2020) <doi:10.3389/fpsyg.2019.02893> and Bertoldo et al. (2020) <doi:10.31234/osf.io/q9f86>.
This package implements the basic elements of the multi-model inference paradigm for up to twenty species-area relationship models (SAR), using simple R list-objects and functions, as in Triantis et al. 2012 <DOI:10.1111/j.1365-2699.2011.02652.x>. The package is scalable and users can easily create their own model and data objects. Additional SAR related functions are provided.
Fits group-regularized generalized linear models (GLMs) using the spike-and-slab group lasso (SSGL) prior of Bai et al. (2022) <doi:10.1080/01621459.2020.1765784> and extended to GLMs by Bai (2023) <doi:10.48550/arXiv.2007.07021>. This package supports fitting the SSGL model for the following GLMs with group sparsity: Gaussian linear regression, binary logistic regression, and Poisson regression.
Routines to write, simulate, and validate stock-flow consistent (SFC) models. The accounting structure of SFC models are described in Godley and Lavoie (2007, ISBN:978-1-137-08599-3). The algorithms implemented to solve the models (Gauss-Seidel and Broyden) are described in Kinsella and O'Shea (2010) <doi:10.2139/ssrn.1729205> and Peressini and Sullivan (1988, ISBN:0-387-96614-5).
Inferring causation from time series data through empirical dynamic modeling (EDM), with methods such as convergent cross mapping from Sugihara et al. (2012) <doi:10.1126/science.1227079>, partial cross mapping as outlined in Leng et al. (2020) <doi:10.1038/s41467-020-16238-0>, and cross mapping cardinality as described in Tao et al. (2023) <doi:10.1016/j.fmre.2023.01.007>.
This package provides functions for attaching tags to R objects, searching for objects based on tags, and removing tags from objects. It also includes a function for removing all tags from an object, as well as a function for deleting all objects with a specific tag from the R environment. The package is useful for organizing and managing large collections of objects in R.
Semiparametric modeling of lifetime data with crossing survival curves via Yang and Prentice model with baseline hazard/odds modeled with Bernstein polynomials. Details about the model can be found in Demarqui et al. (2019) <arXiv:1910.04475>. Model fitting can be carried out via both maximum likelihood and Bayesian approaches. The package also provides point and interval estimation for the crossing survival times.
This package provides tooling to group dates by a variety of periods including: yearly, monthly, by second, by week of the month, and more. The groups are defined in such a way that they also represent the distance between dates in terms of the period. This extracts valuable information that can be used in further calculations that rely on a specific temporal spacing between observations.
Cluster ensembles are collections of individual solutions to a given clustering problem which are useful or necessary to consider in a wide range of applications. This R package provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structures for representing partitions and hierarchies, and facilities for computing on them, including methods for measuring proximity and obtaining consensus and secondary clusterings.
An R interface for the remote file hosting service Box (<https://www.box.com/>). In addition to uploading and downloading files, this package includes functions which mirror base R operations for local files, (e.g. box_load(), box_save(), box_read(), box_setwd(), etc.), as well as git style functions for entire directories (e.g. box_fetch(), box_push()).
This package implements v2 of the B.L.S. API for requests of survey information and time series data through 3-tiered API that allows users to interact with the raw API directly, create queries through a functional interface, and re-shape the data structures returned to fit common uses. The API definition is located at: <https://www.bls.gov/developers/api_signature_v2.htm>.