Publicly available data from Medicare frequently requires extensive initial effort to extract desired variables and merge them; this package formalizes the techniques I've found work best. More information on the Medicare program, as well as guidance for the publicly available data this package targets, can be found on CMS's website covering publicly available data. See <https://www.cms.gov/Research-Statistics-Data-and-Systems/Research-Statistics-Data-and-Systems.html>.
This package provides a toolkit containing statistical analysis models motivated by multivariate forms of the Conway-Maxwell-Poisson (COM-Poisson) distribution for flexible modeling of multivariate count data, especially in the presence of data dispersion. Currently the package only supports bivariate data, via the bivariate COM-Poisson distribution described in Sellers et al. (2016) <doi:10.1016/j.jmva.2016.04.007>. Future development will extend the package to higher-dimensional data.
This package provides functions are provided for internal use by the spatial capture-recapture package secr (from version 5.4.0). The idea is to speed up the installation of secr', and possibly reduce its size. Initially the functions are those for area and transect search that use numerical integration code from RcppNumerical and RcppEigen'. The functions are not intended to be user-friendly and require considerable preprocessing of data.
Fit a univariate-guided sparse regression (lasso), by a two-stage procedure. The first stage fits p separate univariate models to the response. The second stage gives more weight to the more important univariate features, and preserves their signs. Conveniently, it returns an objects that inherits from class glmnet', so that all of the methods for glmnet are available. See Chatterjee, Hastie and Tibshirani (2025) <doi:10.1162/99608f92.c79ff6db> for details.
The Vega-Lite JavaScript framework provides a higher-level grammar for visual analysis, akin to ggplot or Tableau', that generates complete Vega specifications. Functions exist which enable building a valid spec from scratch or importing a previously created spec file. Functions also exist to export spec files and to generate code which will enable plots to be embedded in properly configured web pages. The default behavior is to generate an htmlwidget'.
For distributions whose probability density functions are log-concave, the adaptive rejection sampling algorithm can be used to build envelope functions for sampling. For others, the modified adaptive rejection sampling algorithm, the concave-convex adaptive rejection sampling algorithm, and the adaptive slice sampling algorithm can be used. This R package mainly includes these four functions: rARS(), rMARS(), rCCARS(), and rASS(). These functions can realize sampling based on the algorithms above.
This package provides a BiocBook can be created by authors (e.g. R developers, but also scientists, teachers, communicators, ...) who wish to 1) write (compile a body of biological and/or bioinformatics knowledge), 2) containerize (provide Docker images to reproduce the examples illustrated in the compendium), 3) publish (deploy an online book to disseminate the compendium), and 4) version (automatically generate specific online book versions and Docker images for specific Bioconductor releases).
SNPediaR provides some tools for downloading and parsing data from the SNPedia web site <http://www.snpedia.com>. The implemented functions allow users to import the wiki text available in SNPedia pages and to extract the most relevant information out of them. If some information in the downloaded pages is not automatically processed by the library functions, users can easily implement their own parsers to access it in an efficient way.
This package provides a Bayesian data modeling scheme that performs four interconnected tasks: (i) characterizes the uncertainty of the elicited parametric prior; (ii) provides exploratory diagnostic for checking prior-data conflict; (iii) computes the final statistical prior density estimate; and (iv) executes macro- and micro-inference. Primary reference is Mukhopadhyay, S. and Fletcher, D. 2018 paper "Generalized Empirical Bayes via Frequentist Goodness of Fit" (<https://www.nature.com/articles/s41598-018-28130-5 >).
Cluster Evolution Analytics allows us to use exploratory what if questions in the sense that the present information of an object is plugged-in a dataset in a previous time frame so that we can explore its evolution (and of its neighbors) to the present. See the URL for the papers associated with this package, as for instance, Morales-Oñate and Morales-Oñate (2024) <doi:10.1016/j.softx.2024.101921>.
Original ctsem (continuous time structural equation modelling) functionality, based on the OpenMx software, as described in Driver, Oud, Voelkle (2017) <doi:10.18637/jss.v077.i05>, with updated details in vignette. Combines stochastic differential equations representing latent processes with structural equation measurement models. This package is maintained for consistency with the original ctsem paper, but for the much newer and more capable ctsem package, see <https://cran.r-project.org/package=ctsem>.
Simulate and fitting exponential multivariate Hawkes model. This package simulates a multivariate Hawkes model, introduced by Hawkes (1971) <doi:10.2307/2334319>, with an exponential kernel and fits the parameters from the data. Models with the constant parameters, as well as complex dependent structures, can also be simulated and estimated. The estimation is based on the maximum likelihood method, introduced by introduced by Ozaki (1979) <doi:10.1007/BF02480272>, with maxLik package.
This package contains an implementation of an independent component analysis (ICA) for grouped data. The main function groupICA() performs a blind source separation, by maximizing an independence across sources and allows to adjust for varying confounding for user-specified groups. Additionally, the package contains the function uwedge() which can be used to approximately jointly diagonalize a list of matrices. For more details see the project website <https://sweichwald.de/groupICA/>.
The hotspots package is designed to look within a set of measured values of a variable and identify values that are disproportionately high based on both the deviance of any given value from a statistical distribution and its similarity to other values. Because this relative magnitude of each value is taken into account, a value that is a statistical outlier may not always be a hot spot if other values are similarly large.
This package provides a suite of multivariate methods and data visualization tools to implement profile analysis and cross-validation techniques described in Davison & Davenport (2002) <DOI: 10.1037/1082-989X.7.4.468>, Bulut (2013), and other published and unpublished resources. The package includes routines to perform criterion-related profile analysis, profile analysis via multidimensional scaling, moderated profile analysis, profile analysis by group, and a within-person factor model to derive score profiles.
An implementation of the Elston-Stewart algorithm for calculating pedigree likelihoods given genetic marker data (Elston and Stewart (1971) <doi:10.1159/000152448>). The standard algorithm is extended to allow inbred founders. pedprobr is part of the pedsuite', a collection of packages for pedigree analysis in R. In particular, pedprobr depends on pedtools for pedigree manipulations and pedmut for mutation modelling. For more information, see Pedigree Analysis in R (Vigeland, 2021, ISBN:9780128244302).
Generic code for estimating treatment effects with panel data. The idea is to break into separate steps organizing the data, looping over groups and time periods, computing group-time average treatment effects, and aggregating group-time average treatment effects. Often, one is able to implement a new identification/estimation procedure by simply replacing the step on estimating group-time average treatment effects. See several different examples of this approach in the package documentation.
This package provides interface to sparsepp - fast, memory efficient hash map. It is derived from Google's excellent sparsehash implementation. We believe sparsepp provides an unparalleled combination of performance and memory usage, and will outperform your compiler's unordered_map on both counts. Only Google's dense_hash_map is consistently faster, at the cost of much greater memory usage (especially when the final size of the map is not known in advance).
Statistical pattern recognition and dating using archaeological artefacts assemblages. Package of statistical tools for archaeology. hclustcompro()/perioclust(): Bellanger Lise, Coulon Arthur, Husi Philippe (2021, ISBN:978-3-030-60103-4). mapclust(): Bellanger Lise, Coulon Arthur, Husi Philippe (2021) <doi:10.1016/j.jas.2021.105431>. seriograph(): Desachy Bruno (2004) <doi:10.3406/pica.2004.2396>. cerardat(): Bellanger Lise, Husi Philippe (2012) <doi:10.1016/j.jas.2011.06.031>.
This package provides tools for the stochastic simulation of effectiveness scores to mitigate data-related limitations of Information Retrieval evaluation research, as described in Urbano and Nagler (2018) <doi:10.1145/3209978.3210043>. These tools include: fitting, selection and plotting distributions to model system effectiveness, transformation towards a prespecified expected value, proxy to fitting of copula models based on these distributions, and simulation of new evaluation data from these distributions and copula models.
This package provides a variational Bayesian finite mixture model for the clustering of categorical data, and can implement variable selection and semi-supervised outcome guiding if desired. Incorporates an option to perform model averaging over multiple initialisations to reduce the effects of local optima and improve the automatic estimation of the true number of clusters. For further details, see the paper by Rao and Kirk (2024) <doi:10.48550/arXiv.2406.16227>.
Graphical tools for visualizing high-dimensional data along a path of alternating one- and two-dimensional plots. Includes optional interactive graphics via loon (which uses tcltk from base R). Support is provided for constructing graph structures and, when available, plotting them with Bioconductor packages (e.g., graph', Rgraphviz'); these are optional and examples/vignettes are skipped if they are not installed. For algorithms and further details, see <doi:10.18637/jss.v095.i04>.
This package is to find SNV/Indel differences between two bam files with near relationship in a way of pairwise comparison through each base position across the genome region of interest. The difference is inferred by Fisher test and euclidean distance, the input of which is the base count (A,T,G,C) in a given position and read counts for indels that span no less than 2bp on both sides of indel region.
This package contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library. All models return coda mcmc objects that can then be summarized using the coda package. Some useful utility functions such as density functions, pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.