High-performance implementation of various effect plots useful for regression and probabilistic classification tasks. The package includes partial dependence plots (Friedman, 2021, <doi:10.1214/aos/1013203451>), accumulated local effect plots and M-plots (both from Apley and Zhu, 2016, <doi:10.1111/rssb.12377>), as well as plots that describe the statistical associations between model response and features. It supports visualizations with either ggplot2 or plotly', and is compatible with most models, including Tidymodels', models wrapped in DALEX explainers, or models with case weights.
Gaussian processes are flexible distributions to model functional data. Whilst theoretically appealing, they are computationally cumbersome except for small datasets. This package implements two methods for scaling Gaussian process inference in Stan'. First, a sparse approximation of the likelihood that is generally applicable and, second, an exact method for regularly spaced data modeled by stationary kernels using fast Fourier methods. Utility functions are provided to compile and fit Stan models using the cmdstanr interface. References: Hoffmann and Onnela (2025) <doi:10.18637/jss.v112.i02>.
Similar to rstantools for rstan', the instantiate package builds pre-compiled CmdStan models into CRAN-ready statistical modeling R packages. The models compile once during installation, the executables live inside the file systems of their respective packages, and users have the full power and convenience of cmdstanr without any additional compilation after package installation. This approach saves time and helps R package developers migrate from rstan to the more modern cmdstanr'. Packages rstantools', cmdstanr', stannis', and stanapi are similar Stan clients with different objectives.
The package converts R data onto input and data for LocalSolver, executes optimization and exposes optimization results as R data. LocalSolver (http://www.localsolver.com/) is an optimization engine developed by Innovation24 (http://www.innovation24.fr/). It is designed to solve large-scale mixed-variable non-convex optimization problems. The localsolver package is developed and maintained by WLOG Solutions (http://www.wlogsolutions.com/en/) in collaboration with Decision Support and Analysis Division at Warsaw School of Economics (http://www.sgh.waw.pl/en/).
Empirical statistical analysis, visualization and simulation of diffusion and contagion processes on networks. The package implements algorithms for calculating network diffusion statistics such as transmission rate, hazard rates, exposure models, network threshold levels, infectiousness (contagion), and susceptibility. The package is inspired by work published in Valente, et al., (2015) <DOI:10.1016/j.socscimed.2015.10.001>; Valente (1995) <ISBN: 9781881303213>, Myers (2000) <DOI:10.1086/303110>, Iyengar and others (2011) <DOI:10.1287/mksc.1100.0566>, Burt (1987) <DOI:10.1086/228667>; among others.
All PubChem compounds are downloaded to a local computer, but for each compound, only partial records are used. The data are organized into small files referenced by PubChem CID. This package also contains functions to parse the biologically relevant compounds from all PubChem compounds, using biological database sources, pathway presence, and taxonomic relationships. Taxonomy is used to generate a lowest common ancestor taxonomy ID (NCBI) for each biological metabolite, which then enables creation of taxonomically specific metabolome databases for any taxon.
Phylogenetic Diversity (PD, Faith 1992), Evolutionary Distinctiveness (ED, Isaac et al. 2007), Phylogenetic Endemism (PE, Rosauer et al. 2009; Laffan et al. 2016), and Weighted Endemism (WE, Laffan et al. 2016) for presence-absence raster. Faith, D. P. (1992) <doi:10.1016/0006-3207(92)91201-3> Isaac, N. J. et al. (2007) <doi:10.1371/journal.pone.0000296> Laffan, S. W. et al. (2016) <doi:10.1111/2041-210X.12513> Rosauer, D. et al. (2009) <doi:10.1111/j.1365-294X.2009.04311.x>.
The most important function of the R package is the genetic effects analysis of small RNA in hybrid plants via two methods, and at the same time, it provides various forms of graph related to data characteristics and expression analysis. In terms of two classification methods, one is the calculation of the additive (a) and dominant (d), the other is the evaluation of expression level dominance by comparing the total expression of the small RNA in progeny with the expression level in the parent species.
Calculate numerical agricultural soil management indicators from on a management timeline of an arable field. Currently, indicators for carbon (C) input into the soil system, soil tillage intensity rating (STIR), number of soil cover and living plant cover days, N fertilization and livestock intensity, and plant diversity are implemented. The functions can also be used independently of the management timeline to calculate some indicators. The package contains tables with reference information for the functions, as well as a *.xlsx template to collect the management data.
This package performs two-way tests in independent groups designs. These are two-way ANOVA, two-way ANOVA under heteroscedasticity: parametric bootstrap based generalized test and generalized pivotal quantity based generalized test, two-way ANOVA for medians, trimmed means, M-estimators. The package performs descriptive statistics and graphical approaches. Moreover, it assesses variance homogeneity and normality of data in each group via tests and plots. All twowaytests functions are designed for two-way layout (Dag et al., 2024, <doi:10.1016/j.softx.2024.101862>).
Estimates the predicted 10-year cardiovascular (CVD) risk score (in probability) for civilian women, women military service members and veterans by inputting patient profiles. The proposed women CVD risk score improves the accuracy of the existing American College of Cardiology/American Heart Association CVD risk assessment tool in predicting longâ term CVD risk for VA women, particularly in young and racial/ethnic minority women. See the reference: Jeonâ Slaughter, H., Chen, X., Tsai, S., Ramanan, B., & Ebrahimi, R. (2021) <doi:10.1161/JAHA.120.019217>.
This package provides a collection of widely used univariate data sets of various applied domains on applications of distribution theory. The functions allow researchers and practitioners to quickly, easily, and efficiently access and use these data sets. The data are related to different applied domains and as follows: Bio-medical, survival analysis, medicine, reliability analysis, hydrology, actuarial science, operational research, meteorology, extreme values, quality control, engineering, finance, sports and economics. The total 100 data sets are documented along with associated references for further details and uses.
Given independent and identically distributed observations X(1), ..., X(n), allows to compute the maximum likelihood estimator (MLE) of probability mass function (pmf) under the assumption that it is log-concave, see Weyermann (2007) and Balabdaoui, Jankowski, Rufibach, and Pavlides (2012). The main functions of the package are logConDiscrMLE that allows computation of the log-concave MLE, logConDiscrCI that computes pointwise confidence bands for the MLE, and kInflatedLogConDiscr that computes a mixture of a log-concave PMF and a point mass at k.
This package contains functions to estimate the proportion of effects stronger than a threshold of scientific importance (function prop_stronger), to nonparametrically characterize the distribution of effects in a meta-analysis (calib_ests, pct_pval), to make effect size conversions (r_to_d, r_to_z, z_to_r, d_to_logRR), to compute and format inference in a meta-analysis (format_CI, format_stat, tau_CI), to scrape results from existing meta-analyses for re-analysis (scrape_meta, parse_CI_string, ci_to_var).
This package provides functions for specifying and fitting nested dichotomy logistic regression models for a multi-category response and methods for summarising and plotting those models. Nested dichotomies are statistically independent, and hence provide an additive decomposition of tests for the overall polytomous response. When the dichotomies make sense substantively, this method can be a simpler alternative to the standard multinomial logistic model which compares response categories to a reference level. See: J. Fox (2016), "Applied Regression Analysis and Generalized Linear Models", 3rd Ed., ISBN 1452205663.
This package provides wrapper functions to access the ProPublica's Congress and Campaign Finance APIs. The Congress API provides near real-time access to legislative data from the House of Representatives, the Senate and the Library of Congress. The Campaign Finance API provides data from United States Federal Election Commission filings and other sources. The API covers summary information for candidates and committees, as well as certain types of itemized data. For more information about these APIs go to: <https://www.propublica.org/datastore/apis>.
An implementation of popular evaluation metrics that are commonly used in survival prediction including Concordance Index, Brier Score, Integrated Brier Score, Integrated Square Error, Integrated Absolute Error and Mean Absolute Error. For a detailed information, see (Ishwaran H, Kogalur UB, Blackstone EH and Lauer MS (2008) <doi:10.1214/08-AOAS169>) , (Moradian H, Larocque D and Bellavance F (2017) <doi:10.1007/s10985-016-9372-1>), (Hanpu Zhou, Hong Wang, Sizheng Wang and Yi Zou (2023) <doi:10.32614/rj-2023-009>) for different evaluation metrics.
Additive copula regression for regression problems with binary outcome via gradient boosting [Brant, Hobæk Haff (2022); <arXiv:2208.04669>]. The fitting process includes a specialised model selection algorithm for each component, where each component is found (by greedy optimisation) among all the D-vines with only Gaussian pair-copulas of a fixed dimension, as specified by the user. When the variables and structure have been selected, the algorithm then re-fits the component where the pair-copula distributions can be different from Gaussian, if specified.
Computes diagnostics for linear regression when treatment effects are heterogeneous. The output of hettreatreg represents ordinary least squares (OLS) estimates of the effect of a binary treatment as a weighted average of the average treatment effect on the treated (ATT) and the average treatment effect on the untreated (ATU). The program estimates the OLS weights on these parameters, computes the associated model diagnostics, and reports the implicit OLS estimate of the average treatment effect (ATE). See Sloczynski (2019), <http://people.brandeis.edu/~tslocz/Sloczynski_paper_regression.pdf>.
This system allows one to model a multi-variate, multi-response problem with interaction effects. It combines the usual squared error loss for the multi-response problem with some penalty terms to encourage responses that correlate to form groups and also allow for modeling main and interaction effects that exit within the covariates. The optimization method employed is the Alternating Direction Method of Multipliers (ADMM). The implementation is based on the methodology presented on Quachie Asenso, T., & Zucknick, M. (2023) <doi:10.48550/arXiv.2303.11155>.
Utilities to retrieve and tidy U.S. macroeconomic data series from public government data providers. Functions streamline access to series from the Federal Reserve Bank of St. Louis Federal Reserve Economic Data (FRED), the Bureau of Labor Statistics flat files, and the Bureau of Economic Analysis National Income and Product Accounts tables, then return consistent, tidy data frames ready for modeling and graphics. The package includes helpers for date alignment, log-linear projections, and common macro diagnostics, along with convenience plot builders for quick publication-quality charts.
This package provides functions to run statistical analyses on surface-based neuroimaging data, computing measures including cortical thickness and surface area of the whole-brain and of the hippocampi. It can make use of FreeSurfer', fMRIprep', XCP-D', HCP and CAT12 preprocessed datasets and HippUnfold hippocampal segmentation outputs for a given sample by restructuring the data values into a single file. The single file can then be used by the package for analyses independently from its base dataset and without need for its access.
This package provides functions to manipulate binary fingerprints of arbitrary length. A fingerprint is represented by an object of S4 class fingerprint. The bitwise logical functions in R are overridden so that they can be used directly with fingerprint objects. A number of distance metrics are also available. Fingerprints can be converted to Euclidean vectors (i.e., points on the unit hypersphere) and can also be folded. Arbitrary fingerprint formats can be handled via line handlers. Currently handlers are provided for CDK, MOE and BCI fingerprint data.
Analyses districted electoral systems of any magnitude by computing district-party conversion ratios and seats-to-votes deviations, decomposing the sources of deviation. Traditional indexes are also computed. References: Kedar, O., Harsgor, L. and Sheinerman, R.A. (2016). <doi:10.1111/ajps.12225>. Penades, A and Pavia, J.M. (2025) The decomposition of seats-to-votes distortion in elections: mean, variance, malapportionment and participation''. Acknowledgements: The authors wish to thank Consellerà a de Educación, Cultura, Universidades y Empleo, Generalitat Valenciana (grant CIACO/2023/031) for supporting this research.