This package provides a client for retrieving data and metadata from major central bank APIs. It supports access to the Bundesbank SDMX Web Service API (<https://www.bundesbank.de/en/statistics/time-series-databases/help-for-sdmx-web-service/web-service-interface-data>), the Swiss National Bank Data Portal (<https://data.snb.ch/en>), and the European Central Bank Data Portal API (<https://data.ecb.europa.eu/help/api/overview>).
Numerical integration of cause-specific survival curves to arrive at cause-specific cumulative incidence functions, with three usage modes: 1) Convenient API for parametric survival regression followed by competing-risk analysis, 2) API for CFC, accepting user-specified survival functions in R, and 3) Same as 2, but accepting survival functions in C++. For mathematical details and software tutorial, see Mahani and Sharabiani (2019) <DOI:10.18637/jss.v089.i09>.
Developed as a collaboration between Earth lab and the North Central Climate Adaptation Science Center to help users gain insights from available climate data. Includes tools and instructions for downloading climate data via a USGS API and then organizing those data for visualization and analysis that drive insight. Web interface for USGS API can be found at <http://thredds.northwestknowledge.net:8080/thredds/reacch_climate_CMIP5_aggregated_macav2_catalog.html>.
Intensity-duration-frequency (IDF) curves are a widely used analysis-tool in hydrology to assess extreme values of precipitation [e.g. Mailhot et al., 2007, <doi:10.1016/j.jhydrol.2007.09.019>]. The package IDF provides functions to estimate IDF parameters for given precipitation time series on the basis of a duration-dependent generalized extreme value distribution [Koutsoyiannis et al., 1998, <doi:10.1016/S0022-1694(98)00097-3>].
Psychometric analysis and scoring of judgment data using polytomous Item-Response Theory (IRT) models, as described in Myszkowski and Storme (2019) <doi:10.1037/aca0000225> and Myszkowski (2021) <doi:10.1037/aca0000287>. A function is used to automatically compare and select models, as well as to present a variety of model-based statistics. Plotting functions are used to present category curves, as well as information, reliability and standard error functions.
Includes some procedures for latent variable modeling with a particular focus on multilevel data. The LAM package contains mean and covariance structure modelling for multivariate normally distributed data (mlnormal()
; Longford, 1987; <doi:10.1093/biomet/74.4.817>), a general Metropolis-Hastings algorithm (amh()
; Roberts & Rosenthal, 2001, <doi:10.1214/ss/1015346320>) and penalized maximum likelihood estimation (pmle()
; Cole, Chu & Greenland, 2014; <doi:10.1093/aje/kwt245>).
Machine learning estimator specifically optimized for predictive modeling of ordered non-numeric outcomes. ocf provides forest-based estimation of the conditional choice probabilities and the covariatesâ marginal effects. Under an "honesty" condition, the estimates are consistent and asymptotically normal and standard errors can be obtained by leveraging the weight-based representation of the random forest predictions. Please reference the use as Di Francesco (2025) <doi:10.1080/07474938.2024.2429596>.
Derives prediction rule ensembles (PREs). Largely follows the procedure for deriving PREs as described in Friedman & Popescu (2008; <DOI:10.1214/07-AOAS148>), with adjustments and improvements. The main function pre()
derives prediction rule ensembles consisting of rules and/or linear terms for continuous, binary, count, multinomial, and multivariate continuous responses. Function gpe()
derives generalized prediction ensembles, consisting of rules, hinge and linear functions of the predictor variables.
An implementation of the Partition Of variation (POV) method as developed by Dr. Thomas A Little <https://thomasalittleconsulting.com> in 1993 for the analysis of semiconductor data for hard drive manufacturing. POV is based on sequential sum of squares and is an exact method that explains all observed variation. It quantitates both the between and within factor variation effects and can quantitate the influence of both continuous and categorical factors.
This package provides functions for quickly writing (and reading back) a data.frame to file in SQLite format. The name stands for *Store Tables using SQLite'*, or alternatively for *Quick Store Tables* (either way, it could be pronounced as *Quest*). For data.frames containing the supported data types it is intended to work as a drop-in replacement for the write_*()
and read_*()
functions provided by similar packages.
This package provides functions and example files to calculate the tRNA
adaptation index, a measure of the level of co-adaptation between the set of tRNA
genes and the codon usage bias of protein-coding genes in a given genome. The methodology is described in dos Reis, Wernisch and Savva (2003) <doi:10.1093/nar/gkg897>, and dos Reis, Savva and Wernisch (2004) <doi:10.1093/nar/gkh834>.
Call job::job(<code here>)
to run R code as an RStudio job and keep your console free in the meantime. This allows for a productive workflow while testing (multiple) long-running chunks of code. It can also be used to organize results using the RStudio Jobs GUI or to test code in a clean environment. Two RStudio Addins can be used to run selected code as a job.
This package provides a set of three two-census methods to the estimate the degree of death registration coverage for a population. Implemented methods include the Generalized Growth Balance method (GGB), the Synthetic Extinct Generation method (SEG), and a hybrid of the two, GGB-SEG. Each method offers automatic estimation, but users may also specify exact parameters or use a graphical interface to guess parameters in the traditional way if desired.
OpenAI
Gym is a open-source Python toolkit for developing and comparing reinforcement learning algorithms. This is a wrapper for the OpenAI
Gym API, and enables access to an ever-growing variety of environments. For more details on OpenAI
Gym, please see here: <https://github.com/openai/gym>. For more details on the OpenAI
Gym API specification, please see here: <https://github.com/openai/gym-http-api>.
This package provides a library for generic interval manipulations using a new interval vector class. Capabilities include: locating various kinds of relationships between two interval vectors, merging overlaps within a single interval vector, splitting an interval vector on its overlapping endpoints, and applying set theoretical operations on interval vectors. Many of the operations in this package were inspired by James Allen's interval algebra, Allen (1983) <doi:10.1145/182.358434>.
Fitting and testing multinomial processing tree (MPT) models, a class of nonlinear models for categorical data. The parameters are the link probabilities of a tree-like graph and represent the latent cognitive processing steps executed to arrive at observable response categories (Batchelder & Riefer, 1999 <doi:10.3758/bf03210812>; Erdfelder et al., 2009 <doi:10.1027/0044-3409.217.3.108>; Riefer & Batchelder, 1988 <doi:10.1037/0033-295x.95.3.318>).
Utility functions to convert between the Spatial classes specified by the package sp', and the well-known binary (WKB) representation for geometry specified by the Open Geospatial Consortium'. Supports Spatial objects of class SpatialPoints
', SpatialPointsDataFrame
', SpatialLines
', SpatialLinesDataFrame
', SpatialPolygons
', and SpatialPolygonsDataFrame
'. Supports WKB geometry types Point', LineString
', Polygon', MultiPoint
', MultiLineString
', and MultiPolygon
'. Includes extensions to enable creation of maps with TIBCO Spotfire'.
The CLL package contains the chronic lymphocytic leukemia (CLL) gene expression data. The CLL data had 24 samples that were either classified as progressive or stable in regards to disease progression. The data came from Dr. Sabina Chiaretti at Division of Hematology, Department of Cellular Biotechnologies and Hematology, University La Sapienza, Rome, Italy and Dr. Jerome Ritz at Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts.
The fst package for R provides a fast, easy and flexible way to serialize data frames. With access speeds of multiple GB/s, fst is specifically designed to unlock the potential of high speed solid state disks. Data frames stored in the fst format have full random access, both in column and rows. The fst format allows for random access of stored data and compression with the LZ4 and ZSTD compressors.
This package provides methods for choosing the rank of an SVD (singular value decomposition) approximation via cross validation. The package provides both Gabriel-style "block" holdouts and Wold-style "speckled" holdouts. It also includes an implementation of the SVDImpute algorithm. For more information about Bi-cross-validation, see Owen & Perry's 2009 AoAS
article (at <arXiv:0908.2062>
) and Perry's 2009 PhD
thesis (at <arXiv:0909.3052>
).
Set of forecasting tools to predict ICU beds using a Vector Error Correction model with a single cointegrating vector. Method described in Berta, P. Lovaglio, P.G. Paruolo, P. Verzillo, S., 2020. "Real Time Forecasting of Covid-19 Intensive Care Units demand" Health, Econometrics and Data Group (HEDG) Working Papers 20/16, HEDG, Department of Economics, University of York, <https://www.york.ac.uk/media/economics/documents/hedg/workingpapers/2020/2016.pdf>.
Structure mining from XGBoost and LightGBM
models. Key functionalities of this package cover: visualisation of tree-based ensembles models, identification of interactions, measuring of variable importance, measuring of interaction importance, explanation of single prediction with break down plots (based on xgboostExplainer
and iBreakDown
packages). To download the LightGBM
use the following link: <https://github.com/Microsoft/LightGBM>
. EIX is a part of the DrWhy.AI
universe.
R interface for H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection
, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML
).
These datasets and functions accompany Wolfe and Schneider (2017) - Intuitive Introductory Statistics (ISBN: 978-3-319-56070-0) <doi:10.1007/978-3-319-56072-4>. They are used in the examples throughout the text and in the end-of-chapter exercises. The datasets are meant to cover a broad range of topics in order to appeal to the diverse set of interests and backgrounds typically present in an introductory Statistics class.