Shed light on black box machine learning models by the help of model performance, variable importance, global surrogate models, ICE profiles, partial dependence (Friedman J. H. (2001) <doi:10.1214/aos/1013203451>), accumulated local effects (Apley D. W. (2016) <doi:10.48550/arXiv.1612.08468>), further effects plots, interaction strength, and variable contribution breakdown (Gosiewska and Biecek (2019) <doi:10.48550/arXiv.1903.11420>). All tools are implemented to work with case weights and allow for stratified analysis. Furthermore, multiple flashlights can be combined and analyzed together.
Publication-ready regional gene locus plots similar to those produced by the web interface LocusZoom <https://my.locuszoom.org>, but running locally in R. Genetic or genomic data with gene annotation tracks are plotted via R base graphics, ggplot2 or plotly', allowing flexibility and easy customisation including laying out multiple locus plots on the same page. It uses the LDlink API <https://ldlink.nih.gov/?tab=apiaccess> to query linkage disequilibrium data from the 1000 Genomes Project and can overlay this on plots <doi:10.1093/bioadv/vbaf006>.
This package provides access to coded election programmes from the Manifesto Corpus and to the Manifesto Project's Main Dataset and routines to analyse this data. The Manifesto Project <https://manifesto-project.wzb.eu> collects and analyses election programmes across time and space to measure the political preferences of parties. The Manifesto Corpus contains the collected and annotated election programmes in the Corpus format of the package tm to enable easy use of text processing and text mining functionality. Specific functions for scaling of coded political texts are included.
Maximum likelihood estimates are obtained via an EM algorithm with either a first-order or a fully exponential Laplace approximation as documented by Broatch and Karl (2018) <doi:10.48550/arXiv.1710.05284>, Karl, Yang, and Lohr (2014) <doi:10.1016/j.csda.2013.11.019>, and by Karl (2012) <doi:10.1515/1559-0410.1471>. Karl and Zimmerman <doi:10.1016/j.jspi.2020.06.004> use this package to illustrate how the home field effect estimator from a mixed model can be biased under nonrandom scheduling.
This package provides a critical first step in systematic literature reviews and mining of academic texts is to identify relevant texts from a range of sources, particularly databases such as Web of Science or Scopus'. These databases often export in different formats or with different metadata tags. synthesisr expands on the tools outlined by Westgate (2019) <doi:10.1002/jrsm.1374> to import bibliographic data from a range of formats (such as bibtex', ris', or ciw') in a standard way, and allows merging and deduplication of the resulting dataset.
Analyzes shooting data with respect to group shape, precision, and accuracy. This includes graphical methods, descriptive statistics, and inference tests using standard, but also non-parametric and robust statistical methods. Implements distributions for radial error in bivariate normal variables. Works with files exported by OnTarget PC/TDS', Silver Mountain e-target, ShotMarker e-target, SIUS e-target, or Taran', as well as with custom data files in text format. Supports inference from range statistics such as extreme spread. Includes a set of web-based graphical user interfaces.
This is a package for parsing Affymetrix files (CDF, CEL, CHP, BPMAP, BAR). It provides methods for fast and memory efficient parsing of Affymetrix files using the Affymetrix' Fusion SDK. Both ASCII- and binary-based files are supported. Currently, there are methods for reading chip definition file (CDF) and a cell intensity file (CEL). These files can be read either in full or in part. For example, probe signals from a few probesets can be extracted very quickly from a set of CEL files into a convenient list structure.
This package provides an interface for working with large matrices stored in files, not in computer memory. It supports multiple non-character data types (double, integer, logical and raw) of various sizes (e.g. 8 and 4 byte real values). Access to parts of the matrix is done by indexing, exactly as with usual R matrices. It supports very large matrices; the package has been tested on multi-terabyte matrices. It allows for more than 2^32 rows or columns, ad allows for quick addition of extra columns to a filematrix.
This package provides a set of tools for the statistical analysis of data using:
normal linear models;
generalized linear models;
negative binomial regression models as alternative to the Poisson regression models under the presence of overdispersion;
beta-binomial and random-clumped binomial regression models as alternative to the binomial regression models under the presence of overdispersion;
zero-inflated and zero-altered regression models to deal with zero-excess in count data;
generalized nonlinear models;
generalized estimating equations for cluster correlated data.
Estimate the linear and nonlinear autoregressive distributed lag (ARDL & NARDL) models and the corresponding error correction models, and test for longrun and short-run asymmetric. The general-to-specific approach is also available in estimating the ARDL and NARDL models. The Pesaran, Shin & Smith (2001) (<doi:10.1002/jae.616>) bounds test for level relationships is also provided. The ardl.nardl package also performs short-run and longrun symmetric restrictions available at Shin et al. (2014) <doi:10.1007/978-1-4899-8008-3_9> and their corresponding tests.
This package provides functions to access data from public RESTful APIs including World Bank API and REST Countries API', retrieving real-time or historical information related to Algeria. The package enables users to query economic indicators and international demographic and geopolitical statistics in a reproducible way. It is designed for researchers, analysts, and developers who require reliable and programmatic access to Algerian data through established APIs. For more information on the APIs, see: World Bank API <https://datahelpdesk.worldbank.org/knowledgebase/articles/889392> and REST Countries API <https://restcountries.com/>.
Download data from the time-series databases of the Bundesbank, the German central bank. See the overview at the Bundesbank website (<https://www.bundesbank.de/en/statistics/time-series-databases>) for available series. The package provides only a single function, getSeries(), which supports both traditional and real-time datasets; it will also download meta data if available. Downloaded data can automatically be arranged in various formats, such as data frames or zoo series. The data may optionally be cached, so as to avoid repeated downloads of the same series.
This package provides a collection of novel tools for generating species distribution and abundance models (SDM) that are dynamic through both space and time. These highly flexible functions incorporate spatial and temporal aspects across key SDM stages; including when cleaning and filtering species occurrence data, generating pseudo-absence records, assessing and correcting sampling biases and autocorrelation, extracting explanatory variables and projecting distribution patterns. Throughout, functions utilise Google Earth Engine and Google Drive to minimise the computing power and storage demands associated with species distribution modelling at high spatio-temporal resolution.
This package provides functions to perform simulations of ANOVA designs of up to three factors. Calculates the observed power and average observed effect size for all main effects and interactions in the ANOVA, and all simple comparisons between conditions. Includes functions for analytic power calculations and additional helper functions that compute effect sizes for ANOVA designs, observed error rates in the simulations, and functions to plot power curves. Please see Lakens, D., & Caldwell, A. R. (2021). "Simulation-Based Power Analysis for Factorial Analysis of Variance Designs". <doi:10.1177/2515245920951503>.
The standard index of DNA methylation (beta) is computed from methylated and unmethylated signal intensities. Betas calculated from raw signal intensities perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. This package provides 15 flavours of betas and three performance metrics, with methods for objects produced by the methylumi and minfi packages.
This package provides a function to calibrate variant effect scores against evidence strength categories defined by the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) guidelines. The method computes likelihood ratios of pathogenicity via kernel density estimation of pathogenic and benign score distributions, and derives score intervals corresponding to ACMG/AMP evidence levels. This enables researchers and clinical geneticists to interpret functional and computational variant scores in a reproducible and standardised manner. For details, see Badonyi and Marsh (2025) <doi:10.1093/bioinformatics/btaf503>.
Multimodal distributions can be modelled as a mixture of components. The model is derived using the Pareto Density Estimation (PDE) for an estimation of the pdf. PDE has been designed in particular to identify groups/classes in a dataset. Precise limits for the classes can be calculated using the theorem of Bayes. Verification of the model is possible by QQ plot, Chi-squared test and Kolmogorov-Smirnov test. The package is based on the publication of Ultsch, A., Thrun, M.C., Hansen-Goos, O., Lotsch, J. (2015) <DOI:10.3390/ijms161025897>.
This package provides a collection of R functions were implemented from published and available analytic solutions for the One-Dimensional Boussinesq Equation (ground-water). In particular, the function "beq.lin()" is the analytic solution of the linearized form of Boussinesq Equation between two different head-based boundary (Dirichlet) conditions; "beq.song" is the non-linear power-series analytic solution of the motion of a wetting front over a dry bedrock (Song at al, 2007, see complete reference on function documentation). Bugs/comments/questions/collaboration of any kind are warmly welcomed.
Several functions are available for calculating the most widely used effect sizes (ES), along with their variances, confidence intervals and p-values. The output includes ES's of d (mean difference), g (unbiased estimate of d), r (correlation coefficient), z (Fisher's z), and OR (odds ratio and log odds ratio). In addition, NNT (number needed to treat), U3, CLES (Common Language Effect Size) and Cliff's Delta are computed. This package uses recommended formulas as described in The Handbook of Research Synthesis and Meta-Analysis (Cooper, Hedges, & Valentine, 2009).
This package provides peruvian agricultural production data from the Agriculture Minestry of Peru (MINAGRI). The first version includes 6 crops: rice, quinoa, potato, sweet potato, tomato and wheat; all of them across 24 departments. Initially, in excel files which has been transformed and assembled using tidy data principles, i.e. each variable is in a column, each observation is a row and each value is in a cell. The variables variables are sowing and harvest area per crop, yield, production and price per plot, every one year, from 2004 to 2014.
The main functions in this package are with_cache() and cached_read(). The former is a simple way to cache an R object into a file on disk, using cachem'. The latter is a wrapper around any standard read function, but caches both the output and the file list info. If the input file list info hasn't changed, the cache is used; otherwise, the original files are re-read. This can save time if the original operation requires reading from many files, and/or involves lots of processing.
The purpose of forecastML is to simplify the process of multi-step-ahead forecasting with standard machine learning algorithms. forecastML supports lagged, dynamic, static, and grouping features for modeling single and grouped numeric or factor/sequence time series. In addition, simple wrapper functions are used to support model-building with most R packages. This approach to forecasting is inspired by Bergmeir, Hyndman, and Koo's (2018) paper "A note on the validity of cross-validation for evaluating autoregressive time series prediction" <doi:10.1016/j.csda.2017.11.003>.
Citrus is a computational technique developed for the analysis of high dimensional cytometry data sets. This package extracts, statistically analyzes, and visualizes marker expression from citrus data. This code was used to generate data for Figures 3 and 4 in the forthcoming manuscript: Throm et al. â Identification of Enhanced Interferon-Gamma Signaling in Polyarticular Juvenile Idiopathic Arthritis with Mass Cytometryâ , JCI-Insight. For more information on Citrus, please see: Bruggner et al. (2014) <doi:10.1073/pnas.1408792111>. To download the citrus package, please see <https://github.com/nolanlab/citrus>.
Computes the minimum sample size required for the development of a new multivariable prediction model using the criteria proposed by Riley et al. (2018) <doi: 10.1002/sim.7992>. pmsampsize can be used to calculate the minimum sample size for the development of models with continuous, binary or survival (time-to-event) outcomes. Riley et al. (2018) <doi: 10.1002/sim.7992> lay out a series of criteria the sample size should meet. These aim to minimise the overfitting and to ensure precise estimation of key parameters in the prediction model.