This package performs prediction of a response function from simulated response values, allowing black-box optimization of functions estimated with some error. It includes a simple user interface for such applications, as well as more specialized functions designed to be called by the Migraine software (Rousset and Leblois, 2012 <doi:10.1093/molbev/MSR262>; Leblois et al., 2014 <doi:10.1093/molbev/msu212>; and see URL). The latter functions are used for prediction of likelihood surfaces and implied likelihood ratio confidence intervals, and for exploration of predictor space of the surface. Prediction of the response is based on ordinary Kriging (with residual error) of the input. Estimation of smoothing parameters is performed by generalized cross-validation.
Implementing algorithms and fitting models when sites (possibly remote) share computation summaries rather than actual data over HTTP with a master R process (using opencpu', for example). A stratified Cox model and a singular value decomposition are provided. The former makes direct use of code from the R survival package. (That is, the underlying Cox model code is derived from that in the R survival package.) Sites may provide data via several means: CSV files, Redcap API, etc. An extensible design allows for new methods to be added in the future and includes facilities for local prototyping and testing. Web applications are provided (via shiny') for the implemented methods to help in designing and deploying the computations.
This package provides functions that calculates common types of splitting criteria used in random forests for classification problems, as well as functions that make predictions based on a single tree or a Forest-R.K. model; the package also provides functions to generate importance plot for a Forest-R.K. model, as well as the 2D multidimensional-scaling plot of data points that are colour coded by their predicted class types by the Forest-R.K. model. This package is based on: Bernard, S., Heutte, L., Adam, S., (2008, ISBN:978-3-540-85983-3) "Forest-R.K.: A New Random Forest Induction Method", Fourth International Conference on Intelligent Computing, September 2008, Shanghai, China, pp.430-437.
Estimates the probability of informed trading (PIN) initially introduced by Easley et. al. (1996) <doi:10.1111/j.1540-6261.1996.tb04074.x> . Contribution of the package is that it uses likelihood factorizations of Easley et. al. (2010) <doi:10.1017/S0022109010000074> (EHO factorization) and Lin and Ke (2011) <doi:10.1016/j.finmar.2011.03.001> (LK factorization). Moreover, the package uses different estimation algorithms. Specifically, the grid-search algorithm proposed by Yan and Zhang (2012) <doi:10.1016/j.jbankfin.2011.08.003> , hierarchical agglomerative clustering approach proposed by Gan et. al. (2015) <doi:10.1080/14697688.2015.1023336> and later extended by Ersan and Alici (2016) <doi:10.1016/j.intfin.2016.04.001> .
Bayesian estimation and analysis methods for Probit Unfolding Models (PUMs), a novel class of scaling models designed for binary preference data. These models allow for both monotonic and non-monotonic response functions. The package supports Bayesian inference for both static and dynamic PUMs using Markov chain Monte Carlo (MCMC) algorithms with minimal or no tuning. Key functionalities include posterior sampling, hyperparameter selection, data preprocessing, model fit evaluation, and visualization. The methods are particularly suited to analyzing voting data, such as from the U.S. Congress or Supreme Court, but can also be applied in other contexts where non-monotonic responses are expected. For methodological details, see Shi et al. (2025) <doi:10.48550/arXiv.2504.00423>
.
Conduct simulation-based customized power calculation for clustered time to event data in a mixed crossed/nested design, where a number of cell lines and a number of mice within each cell line are considered to achieve a desired statistical power, motivated by Eckel-Passow and colleagues (2021) <doi:10.1093/neuonc/noab137> and Li and colleagues (2024) <doi:10.48550/arXiv.2404.08927>
. This package provides two commonly used models for powering a design, linear mixed effects and Cox frailty model. Both models account for within-subject (cell line) correlation while holding different distributional assumptions about the outcome. Alternatively, the counterparts of fixed effects model are also available, which produces similar estimates of statistical power.
It provides versatile tools for analysis of birth and death based Markovian Queueing Models and Single and Multiclass Product-Form Queueing Networks. It implements M/M/1, M/M/c, M/M/Infinite, M/M/1/K, M/M/c/K, M/M/c/c, M/M/1/K/K, M/M/c/K/K, M/M/c/K/m, M/M/Infinite/K/K, Multiple Channel Open Jackson Networks, Multiple Channel Closed Jackson Networks, Single Channel Multiple Class Open Networks, Single Channel Multiple Class Closed Networks and Single Channel Multiple Class Mixed Networks. Also it provides a B-Erlang, C-Erlang and Engset calculators. This work is dedicated to the memory of D. Sixto Rios Insua.
SqueezeMeta
is a versatile pipeline for the automated analysis of metagenomics/metatranscriptomics data (<https://github.com/jtamames/SqueezeMeta>
). This package provides functions loading SqueezeMeta
results into R, filtering them based on different criteria, and visualizing the results using basic plots. The SqueezeMeta
project (and any subsets of it generated by the different filtering functions) is parsed into a single object, whose different components (e.g. tables with the taxonomic or functional composition across samples, contig/gene abundance profiles) can be easily analyzed using other R packages such as vegan or DESeq2'. The methods in this package are further described in Puente-Sánchez et al., (2020) <doi:10.1186/s12859-020-03703-2>.
The purpose of this package is to manipulate SVG files that are templates of charts the user wants to produce. In vector graphics one copes with x-/y-coordinates of elements (e.g. lines, rectangles, text). Their scale is often dependent on the program that is used to produce the graphics. In applied statistics one usually has numeric values on a fixed scale (e.g. percentage values between 0 and 100) to show in a chart. Basically, svgtools transforms the statistical values into coordinates and widths/heights of the vector graphics. This is done by stackedBar()
for bar charts, by linesSymbols()
for charts with lines and/or symbols (dot markers) and scatterSymbols()
for scatterplots.
Estimates the authors or speakers of texts. Methods developed in Huang, Perry, and Spirling (2020) <doi:10.1017/pan.2019.49>. The model is built on a Bayesian framework in which the distinctiveness of each speaker is defined by how different, on average, the speaker's terms are to everyone else in the corpus of texts. An optional cross-validation method is implemented to select the subset of terms that generate the most accurate speaker predictions. Once a set of terms is selected, the model can be estimated. Speaker distinctiveness and term influence can be recovered from parameters in the model using package functions. Once fitted, the model can be used to predict authorship of new texts.
This package implements the navigated weighting (NAWT) proposed by Katsumata (2020) <arXiv:2005.10998>
, which improves the inverse probability weighting by utilizing estimating equations suitable for a specific pre-specified parameter of interest (e.g., the average treatment effects or the average treatment effects on the treated) in propensity score estimation. It includes the covariate balancing propensity score proposed by Imai and Ratkovic (2014) <doi:10.1111/rssb.12027>, which uses covariate balancing conditions in propensity score estimation. The point estimate of the parameter of interest as well as coefficients for propensity score estimation and their uncertainty are produced using the M-estimation. The same functions can be used to estimate average outcomes in missing outcome cases.
In the spirit of Anscombe's quartet, this package includes datasets that demonstrate the importance of visualizing your data, the importance of not relying on statistical summary measures alone, and why additional assumptions about the data generating mechanism are needed when estimating causal effects. The package includes "Anscombe's Quartet" (Anscombe 1973) <doi:10.1080/00031305.1973.10478966>, D'Agostino McGowan
& Barrett (2023) "Causal Quartet" <doi:10.48550/arXiv.2304.02683>
, "Datasaurus Dozen" (Matejka & Fitzmaurice 2017), "Interaction Triptych" (Rohrer & Arslan 2021) <doi:10.1177/25152459211007368>, "Rashomon Quartet" (Biecek et al. 2023) <doi:10.48550/arXiv.2302.13356>
, and Gelman "Variation and Heterogeneity Causal Quartets" (Gelman et al. 2023) <doi:10.48550/arXiv.2302.12878>
.
In epigenome-wide association studies, the measured signals for each sample are a mixture of methylation profiles from different cell types. The current approaches to the association detection only claim whether a cytosine-phosphate-guanine (CpG
) site is associated with the phenotype or not, but they cannot determine the cell type in which the risk-CpG
site is affected by the phenotype. We propose a solid statistical method, HIgh REsolution (HIRE), which not only substantially improves the power of association detection at the aggregated level as compared to the existing methods but also enables the detection of risk-CpG
sites for individual cell types. The "HIREewas" R package is to implement HIRE model in R.
OMICsPCA
is an analysis pipeline designed to integrate multi OMICs experiments done on various subjects (e.g. Cell lines, individuals), treatments (e.g. disease/control) or time points and to analyse such integrated data from various various angles and perspectives. In it's core OMICsPCA
uses Principal Component Analysis (PCA) to integrate multiomics experiments from various sources and thus has ability to over data insufficiency issues by using the ingegrated data as representatives. OMICsPCA
can be used in various application including analysis of overall distribution of OMICs assays across various samples /individuals /time points; grouping assays by user-defined conditions; identification of source of variation, similarity/dissimilarity between assays, variables or individuals.
Analyze telemetry datasets generalized to allow any technology. The filtering steps check for false positives caused by reflected transmissions from surfaces and false pings from other noise generating equipment. The filters are based on JSATS filtering algorithms found in package filteRjsats
<https://CRAN.R-project.org/package=filteRjsats>
but have been generalized to allow the user to define many of the filtering variables. Additionally, this package contains scripts used to help identify an optimal maximum blanking period as defined in Capello et al (2015) <doi:10.1371/journal.pone.0134002>. The functions were written according to their manuscript description, but have not been reviewed by the authors for accuracy. It is included here as is, without warranty.
Implementation of Johansen's general formulation of Welch-James's statistic with Approximate Degrees of Freedom, which makes it suitable for testing any linear hypothesis concerning cell means in univariate and multivariate mixed model designs when the data pose non-normality and non-homogeneous variance. Some improvements, namely trimmed means and Winsorized variances, and bootstrapping for calculating an empirical critical value, have been added to the classical formulation. The code departs from a previous SAS implementation by L.M. Lix and H.J. Keselman, available at <http://supp.apa.org/psycarticles/supplemental/met_13_2_110/SAS_Program.pdf> and published in Keselman, H.J., Wilcox, R.R., and Lix, L.M. (2003) <DOI:10.1111/1469-8986.00060>.
This is a package for graphical and statistical analyses of environmental data, with a focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. It provides major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. It comes with numerous built-in data sets from regulatory guidance documents and environmental statistics literature. It includes scripts reproducing analyses presented in the book "EnvStats: An R Package for Environmental Statistics" (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, https://link.springer.com/book/10.1007/978-1-4614-8456-1).
Original ctsem (continuous time structural equation modelling) functionality, based on the OpenMx
software, as described in Driver, Oud, Voelkle (2017) <doi:10.18637/jss.v077.i05>, with updated details in vignette. Combines stochastic differential equations representing latent processes with structural equation measurement models. These functions were split off from the main package of ctsem', as the main package uses the rstan package as a backend now -- offering estimation options from max likelihood to Bayesian. There are nevertheless use cases for the wide format SEM style approach as offered here, particularly when there are no individual differences in observation timing and the number of individuals is large. For the main ctsem package, see <https://cran.r-project.org/package=ctsem>.
This package provides methods for testing the equality between groups of estimated density functions. The package implements FDET (Fourier-based Density Equality Testing) and MDET (Moment-based Density Equality Testing), two new approaches introduced by the author. Both methods extend an earlier testing approach by Delicado (2007), "Functional k-sample problem when data are density functions" <doi:10.1007/s00180-007-0047-y>, which is referred to as DET (Density Equality Testing) in this package for clarity. FDET compares groups of densities based on their global shape using Fourier transforms, while MDET tests for differences in distributional moments. All methods are described in Anarat, Krutmann and Schwender (2025), "Testing for Differences in Extrinsic Skin Aging Based on Density Functions" (Submitted).
Constructing niche models and analyzing patterns of niche evolution. Acts as an interface for many popular modeling algorithms, and allows users to conduct Monte Carlo tests to address basic questions in evolutionary ecology and biogeography. Warren, D.L., R.E. Glor, and M. Turelli (2008) <doi:10.1111/j.1558-5646.2008.00482.x> Glor, R.E., and D.L. Warren (2011) <doi:10.1111/j.1558-5646.2010.01177.x> Warren, D.L., R.E. Glor, and M. Turelli (2010) <doi:10.1111/j.1600-0587.2009.06142.x> Cardillo, M., and D.L. Warren (2016) <doi:10.1111/geb.12455> D.L. Warren, L.J. Beaumont, R. Dinnage, and J.B. Baumgartner (2019) <doi:10.1111/ecog.03900>.
The lipid scrambling activity of protein extracts and purified scramblases is often determined using a fluorescence-based assay involving many manual steps. flippant offers an integrated solution for the analysis and publication-grade graphical presentation of dithionite scramblase assays, as well as a platform for review, dissemination and extension of the strategies it employs. The package's name derives from a play on the fact that lipid scrambling is also sometimes referred to as flipping'. The package is originally published as Cotton, R.J., Ploier, B., Goren, M.A., Menon, A.K., and Graumann, J. (2017). "flippantâ An R package for the automated analysis of fluorescence-based scramblase assays." BMC Bioinformatics 18, 146. <DOI:10.1186/s12859-017-1542-y>.
Handles univariate non-parametric density estimation with parametric starts and asymmetric kernels in a simple and flexible way. Kernel density estimation with parametric starts involves fitting a parametric density to the data before making a correction with kernel density estimation, see Hjort & Glad (1995) <doi:10.1214/aos/1176324627>. Asymmetric kernels make kernel density estimation more efficient on bounded intervals such as (0, 1) and the positive half-line. Supported asymmetric kernels are the gamma kernel of Chen (2000) <doi:10.1023/A:1004165218295>, the beta kernel of Chen (1999) <doi:10.1016/S0167-9473(99)00010-9>, and the copula kernel of Jones & Henderson (2007) <doi:10.1093/biomet/asm068>. User-supplied kernels, parametric starts, and bandwidths are supported.
This package provides a utility library to facilitate the generalization of statistical methods built on a regression framework. Package developers can use modelObj
methods to initiate a regression analysis without concern for the details of the regression model and the method to be used to obtain parameter estimates. The specifics of the regression step are left to the user to define when calling the function. The user of a function developed within the modelObj
framework creates as input a modelObj
that contains the model and the R methods to be used to obtain parameter estimates and to obtain predictions. In this way, a user can easily go from linear to non-linear models within the same package.
The fossil record is a joint expression of ecological, taphonomic, evolutionary, and stratigraphic processes (Holland and Patzkowsky, 2012, ISBN:978-0226649382). This package allowing to simulate biological processes in the time domain (e.g., trait evolution, fossil abundance, phylogenetic trees), and examine how their expression in the rock record (stratigraphic domain) is influenced based on age-depth models, ecological niche models, and taphonomic effects. Functions simulating common processes used in modeling trait evolution or event type data such as first/last occurrences are provided and can be used standalone or as part of a pipeline. The package comes with example data sets and tutorials in several vignettes, which can be used as a template to set up one's own simulation.