This package provides methods to construct frequentist confidence sets with valid marginal coverage for identifying the population-level argmin or argmax based on IID data. For instance, given an n by p loss matrixâ where n is the sample size and p is the number of modelsâ the CS.argmin() method produces a discrete confidence set that contains the model with the minimal (best) expected risk with desired probability. The argmin.HT() method helps check if a specific model should be included in such a confidence set. The main implemented method is proposed by Tianyu Zhang, Hao Lee and Jing Lei (2024) "Winners with confidence: Discrete argmin inference with an application to model selection".
The BioTIME database was first published in 2018 and inspired ideas, questions, project and research article. To make it even more accessible, an R package was created. The BioTIMEr package provides tools designed to interact with the BioTIME database. The functions provided include the BioTIME recommended methods for preparing (gridding and rarefaction) time series data, a selection of standard biodiversity metrics (including species richness, numerical abundance and exponential Shannon) alongside examples on how to display change over time. It also includes a sample subset of both the query and meta data, the full versions of which are freely available on the BioTIME website <https://biotime.st-andrews.ac.uk/home.php>.
Estimates coefficients of extended LASSO penalized linear regression and generalized linear models. Currently lasso and elastic net penalized linear regression and generalized linear models are considered. This package currently utilizes an accurate approximation of L1 penalty and then a modified Jacobi algorithm to estimate the coefficients. There is provision for plotting of the solutions and predictions of coefficients at given values of lambda. This package also contains functions for cross validation to select a suitable lambda value given the data. Also provides a function for estimation in fused lasso penalized linear regression. For more details, see Mandal, B. N.(2014). Computational methods for L1 penalized GLM model fitting, unpublished report submitted to Macquarie University, NSW, Australia.
Calculating the density, cumulative distribution, quantile, and random number of neo-normal distribution. It also interfaces with the brms package, allowing the use of the neo-normal distribution as a custom family. This integration enables the application of various brms formulas for neo-normal regression. Modified to be Stable as Normal from Burr (MSNBurr), Modified to be Stable as Normal from Burr-IIa (MSNBurr-IIa), Generalized of MSNBurr (GMSNBurr), Jones-Faddy Skew-t, Fernandez-Osiewalski-Steel Skew Exponential Power, and Jones Skew Exponential Power distributions are supported. References: Choir, A. S. (2020).Unpublished Dissertation, Iriawan, N. (2000).Unpublished Dissertation, Rigby, R. A., Stasinopoulos, M. D., Heller, G. Z., & Bastiani, F. D. (2019) <doi:10.1201/9780429298547>.
We developed a comprehensive tool that helps with visualization and analysis of networks with the same variables across multiple factor levels. The netShiny contains most of the popular network features such as centrality measures, modularity, and other summary statistics (e.g. clustering coefficient). It also contains known tools to look at the (dis)similarities between two networks, such as pairwise distance measures between networks, set operations on the nodes of the networks, distribution of the weights of the edges and a network representing the difference between two correlation matrices. The package netShiny also contains tools to perform bootstrapping and find clusters in networks. See the netShiny manual for more information, documentation and examples.
We extend two general methods of moment estimators to panel vector autoregression models (PVAR) with p lags of endogenous variables, predetermined and strictly exogenous variables. This general PVAR model contains the first difference GMM estimator by Holtz-Eakin et al. (1988) <doi:10.2307/1913103>, Arellano and Bond (1991) <doi:10.2307/2297968> and the system GMM estimator by Blundell and Bond (1998) <doi:10.1016/S0304-4076(98)00009-8>. We also provide specification tests (Hansen overidentification test, lag selection criterion and stability test of the PVAR polynomial) and classical structural analysis for PVAR models such as orthogonal and generalized impulse response functions, bootstrapped confidence intervals for impulse response analysis and forecast error variance decompositions.
Offers a helping hand to psychologists and other behavioral scientists who routinely deal with experimental data from factorial experiments. It includes several functions to format output from other R functions according to the style guidelines of the APA (American Psychological Association). This formatted output can be copied directly into manuscripts to facilitate data reporting. These features are backed up by a toolkit of several small helper functions, e.g., offering out-of-the-box outlier removal. The package lends its name to Georg "Schorsch" Schuessler, ingenious technician at the Department of Psychology III, University of Wuerzburg. For details on the implemented methods, see Roland Pfister and Markus Janczyk (2016) <doi: 10.20982/tqmp.12.2.p147>.
Generates balancing weights for causal effect estimation in observational studies with binary, multi-category, or continuous point or longitudinal treatments by easing and extending the functionality of several R packages and providing in-house estimation methods. Available methods include those that rely on parametric modeling, optimization, and machine learning. Also allows for assessment of weights and checking of covariate balance by interfacing directly with the cobalt package. Methods for estimating weighted regression models that take into account uncertainty in the estimation of the weights via M-estimation or bootstrapping are available. See the vignette "Installing Supporting Packages" for instructions on how to install any package WeightIt uses, including those that may not be on CRAN.
Implementing algorithms and fitting models when sites (possibly remote) share computation summaries rather than actual data over HTTP with a master R process (using opencpu', for example). A stratified Cox model and a singular value decomposition are provided. The former makes direct use of code from the R survival package. (That is, the underlying Cox model code is derived from that in the R survival package.) Sites may provide data via several means: CSV files, Redcap API, etc. An extensible design allows for new methods to be added in the future and includes facilities for local prototyping and testing. Web applications are provided (via shiny') for the implemented methods to help in designing and deploying the computations.
This package provides functions that calculates common types of splitting criteria used in random forests for classification problems, as well as functions that make predictions based on a single tree or a Forest-R.K. model; the package also provides functions to generate importance plot for a Forest-R.K. model, as well as the 2D multidimensional-scaling plot of data points that are colour coded by their predicted class types by the Forest-R.K. model. This package is based on: Bernard, S., Heutte, L., Adam, S., (2008, ISBN:978-3-540-85983-3) "Forest-R.K.: A New Random Forest Induction Method", Fourth International Conference on Intelligent Computing, September 2008, Shanghai, China, pp.430-437.
Estimates the probability of informed trading (PIN) initially introduced by Easley et. al. (1996) <doi:10.1111/j.1540-6261.1996.tb04074.x> . Contribution of the package is that it uses likelihood factorizations of Easley et. al. (2010) <doi:10.1017/S0022109010000074> (EHO factorization) and Lin and Ke (2011) <doi:10.1016/j.finmar.2011.03.001> (LK factorization). Moreover, the package uses different estimation algorithms. Specifically, the grid-search algorithm proposed by Yan and Zhang (2012) <doi:10.1016/j.jbankfin.2011.08.003> , hierarchical agglomerative clustering approach proposed by Gan et. al. (2015) <doi:10.1080/14697688.2015.1023336> and later extended by Ersan and Alici (2016) <doi:10.1016/j.intfin.2016.04.001> .
Bayesian estimation and analysis methods for Probit Unfolding Models (PUMs), a novel class of scaling models designed for binary preference data. These models allow for both monotonic and non-monotonic response functions. The package supports Bayesian inference for both static and dynamic PUMs using Markov chain Monte Carlo (MCMC) algorithms with minimal or no tuning. Key functionalities include posterior sampling, hyperparameter selection, data preprocessing, model fit evaluation, and visualization. The methods are particularly suited to analyzing voting data, such as from the U.S. Congress or Supreme Court, but can also be applied in other contexts where non-monotonic responses are expected. For methodological details, see Shi et al. (2025) <doi:10.48550/arXiv.2504.00423>.
It provides versatile tools for analysis of birth and death based Markovian Queueing Models and Single and Multiclass Product-Form Queueing Networks. It implements M/M/1, M/M/c, M/M/Infinite, M/M/1/K, M/M/c/K, M/M/c/c, M/M/1/K/K, M/M/c/K/K, M/M/c/K/m, M/M/Infinite/K/K, Multiple Channel Open Jackson Networks, Multiple Channel Closed Jackson Networks, Single Channel Multiple Class Open Networks, Single Channel Multiple Class Closed Networks and Single Channel Multiple Class Mixed Networks. Also it provides a B-Erlang, C-Erlang and Engset calculators. This work is dedicated to the memory of D. Sixto Rios Insua.
SqueezeMeta is a versatile pipeline for the automated analysis of metagenomics/metatranscriptomics data (<https://github.com/jtamames/SqueezeMeta>). This package provides functions loading SqueezeMeta results into R, filtering them based on different criteria, and visualizing the results using basic plots. The SqueezeMeta project (and any subsets of it generated by the different filtering functions) is parsed into a single object, whose different components (e.g. tables with the taxonomic or functional composition across samples, contig/gene abundance profiles) can be easily analyzed using other R packages such as vegan or DESeq2'. The methods in this package are further described in Puente-Sánchez et al., (2020) <doi:10.1186/s12859-020-03703-2>.
Estimates the authors or speakers of texts. Methods developed in Huang, Perry, and Spirling (2020) <doi:10.1017/pan.2019.49>. The model is built on a Bayesian framework in which the distinctiveness of each speaker is defined by how different, on average, the speaker's terms are to everyone else in the corpus of texts. An optional cross-validation method is implemented to select the subset of terms that generate the most accurate speaker predictions. Once a set of terms is selected, the model can be estimated. Speaker distinctiveness and term influence can be recovered from parameters in the model using package functions. Once fitted, the model can be used to predict authorship of new texts.
The purpose of this package is to manipulate SVG files that are templates of charts the user wants to produce. In vector graphics one copes with x-/y-coordinates of elements (e.g. lines, rectangles, text). Their scale is often dependent on the program that is used to produce the graphics. In applied statistics one usually has numeric values on a fixed scale (e.g. percentage values between 0 and 100) to show in a chart. Basically, svgtools transforms the statistical values into coordinates and widths/heights of the vector graphics. This is done by stackedBar() for bar charts, by linesSymbols() for charts with lines and/or symbols (dot markers) and scatterSymbols() for scatterplots.
Allows users to quickly load multiple patients electrocardiographic (ECG) data at once and conduct relevant time analysis of heart rate variability (HRV) without manual edits from a physician or data cleaning specialist. The package provides the unique ability to iteratively filter, plot, and store time analysis results in a data frame while writing plots to a predefined folder. This streamlines the workflow for HRV analysis across multiple datasets. Methods are based on Rodrà guez-Liñares et al. (2011) <doi:10.1016/j.cmpb.2010.05.012>. Examples of applications using this package include Kwon et al. (2022) <doi:10.1007/s10286-022-00865-2> and Lawrence et al. (2023) <doi:10.1016/j.autneu.2022.103056>.
This is a package for graphical and statistical analyses of environmental data, with a focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. It provides major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. It comes with numerous built-in data sets from regulatory guidance documents and environmental statistics literature. It includes scripts reproducing analyses presented in the book "EnvStats: An R Package for Environmental Statistics" (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, https://link.springer.com/book/10.1007/978-1-4614-8456-1).
In epigenome-wide association studies, the measured signals for each sample are a mixture of methylation profiles from different cell types. The current approaches to the association detection only claim whether a cytosine-phosphate-guanine (CpG) site is associated with the phenotype or not, but they cannot determine the cell type in which the risk-CpG site is affected by the phenotype. We propose a solid statistical method, HIgh REsolution (HIRE), which not only substantially improves the power of association detection at the aggregated level as compared to the existing methods but also enables the detection of risk-CpG sites for individual cell types. The "HIREewas" R package is to implement HIRE model in R.
OMICsPCA is an analysis pipeline designed to integrate multi OMICs experiments done on various subjects (e.g. Cell lines, individuals), treatments (e.g. disease/control) or time points and to analyse such integrated data from various various angles and perspectives. In it's core OMICsPCA uses Principal Component Analysis (PCA) to integrate multiomics experiments from various sources and thus has ability to over data insufficiency issues by using the ingegrated data as representatives. OMICsPCA can be used in various application including analysis of overall distribution of OMICs assays across various samples /individuals /time points; grouping assays by user-defined conditions; identification of source of variation, similarity/dissimilarity between assays, variables or individuals.
Description: Provides comprehensive tools for analysing and characterizing mixed-level factorial designs arranged in blocks. Includes construction and validation of incidence structures, computation of C-matrices, evaluation of A-, D-, E-, and MV-efficiencies, checking of orthogonal factorial structure (OFS), diagnostics based on Hamming distance, discrepancy measures, B-criterion, Es^2 statistics, J2-distance and J2-efficiency, Phi-p optimality, and symmetry conditions for universal optimality. The methodological framework follows foundational work on factorial and mixed-level design assessment by Xu and Wu (2001) <doi:10.1214/aos/1013699993>, and Gupta (1983) <doi:10.1111/j.2517-6161.1983.tb01253.x>. These methods assist in selecting, comparing, and studying factorial block designs across a range of experimental situations.
This package implements the navigated weighting (NAWT) proposed by Katsumata (2020) <arXiv:2005.10998>, which improves the inverse probability weighting by utilizing estimating equations suitable for a specific pre-specified parameter of interest (e.g., the average treatment effects or the average treatment effects on the treated) in propensity score estimation. It includes the covariate balancing propensity score proposed by Imai and Ratkovic (2014) <doi:10.1111/rssb.12027>, which uses covariate balancing conditions in propensity score estimation. The point estimate of the parameter of interest as well as coefficients for propensity score estimation and their uncertainty are produced using the M-estimation. The same functions can be used to estimate average outcomes in missing outcome cases.
In the spirit of Anscombe's quartet, this package includes datasets that demonstrate the importance of visualizing your data, the importance of not relying on statistical summary measures alone, and why additional assumptions about the data generating mechanism are needed when estimating causal effects. The package includes "Anscombe's Quartet" (Anscombe 1973) <doi:10.1080/00031305.1973.10478966>, D'Agostino McGowan & Barrett (2023) "Causal Quartet" <doi:10.48550/arXiv.2304.02683>, "Datasaurus Dozen" (Matejka & Fitzmaurice 2017), "Interaction Triptych" (Rohrer & Arslan 2021) <doi:10.1177/25152459211007368>, "Rashomon Quartet" (Biecek et al. 2023) <doi:10.48550/arXiv.2302.13356>, and Gelman "Variation and Heterogeneity Causal Quartets" (Gelman et al. 2023) <doi:10.48550/arXiv.2302.12878>.
Analyze telemetry datasets generalized to allow any technology. The filtering steps check for false positives caused by reflected transmissions from surfaces and false pings from other noise generating equipment. The filters are based on JSATS filtering algorithms found in package filteRjsats <https://CRAN.R-project.org/package=filteRjsats> but have been generalized to allow the user to define many of the filtering variables. Additionally, this package contains scripts used to help identify an optimal maximum blanking period as defined in Capello et al (2015) <doi:10.1371/journal.pone.0134002>. The functions were written according to their manuscript description, but have not been reviewed by the authors for accuracy. It is included here as is, without warranty.