The standard linear regression theory whether frequentist or Bayesian is based on an assumed (revealed?) truth (John Tukey) attitude to models. This is reflected in the language of statistical inference which involves a concept of truth, for example confidence intervals, hypothesis testing and consistency. The motivation behind this package was to remove the word true from the theory and practice of linear regression and to replace it by approximation. The approximations considered are the least squares approximations. An approximation is called valid if it contains no irrelevant covariates. This is operationalized using the concept of a Gaussian P-value which is the probability that pure Gaussian noise is better in term of least squares than the covariate. The precise definition given in the paper, it is intuitive and requires only four simple equations. Its overwhelming advantage compared with a standard F P-value is that is is exact and valid whatever the data. In contrast F P-values are only valid for specially designed simulations. Given this a valid approximation is one where all the Gaussian P-values are less than a threshold p0 specified by the statistician, in this package with the default value 0.01. This approximations approach is not only much simpler it is overwhelmingly better than the standard model based approach. The will be demonstrated using six real data sets, four from high dimensional regression and two from vector autoregression. The simplicity and superiority of Gaussian P-values derive from their universal exactness and validity. This is in complete contrast to standard F P-values which are valid only for carefully designed simulations. The function f1st is the most important function. It is a greedy forward selection procedure which results in either just one or no approximations which may however not be valid. If the size is less than than a threshold with default value 21 then an all subset procedure is called which returns the best valid subset. A good default start is f1st(y,x,kmn=15) The best function for returning multiple approximations is f3st which repeatedly calls f1st. For more information see the web site below and the accompanying papers: L. Davies and L. Duembgen, "Covariate Selection Based on a Model-free Approach to Linear Regression with Exact Probabilities", <doi:10.48550/arXiv.2202.01553>
, L. Davies, "An Approximation Based Theory of Linear Regression", 2024, <doi:10.48550/arXiv.2402.09858>
.
This package provides functions for greenhouse gas flux calculation from chamber measurements.
Implementation of various inference and simulation tools to apply generalized additive models to bivariate dependence structures and non-simplified vine copulas.
This package provides a collection of functions to perform Gaussian quadrature with different weight functions corresponding to the orthogonal polynomials in package orthopolynom. Examples verify the orthogonality and inner products of the polynomials.
An excerpt of the data available at Gapminder.org. For each of 142 countries, the package provides values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007.
The main purpose of this package is to allow fitting of mixture distributions with generalised additive models for location scale and shape models see Chapter 7 of Stasinopoulos et al. (2017) <doi:10.1201/b21973-4>.
This is a dataset package for GANPA, which implements a network-based gene weighting approach to pathway analysis. This package includes data useful for GANPA, such as a functional association network, pathways, an expression dataset and multi-subunit proteins.
This is an add on package to GAMLSS. The purpose of this package is to allow users to defined truncated distributions in GAMLSS models. The main function gen.trun()
generates truncated version of an existing GAMLSS family distribution.
This package provides a collection difference measures for multivariate Gaussian probability density functions, such as the Euclidea mean, the Mahalanobis distance, the Kullback-Leibler divergence, the J-Coefficient, the Minkowski L2-distance, the Chi-square divergence and the Hellinger Coefficient.
Uses a slice sampling-based Markov chain Monte Carlo to conduct Bayesian fitting and inference for generalized additive mixed models. Generalized linear mixed models and generalized additive models are also handled as special cases of generalized additive mixed models. The methodology and software is described in Pham, T.H. and Wand, M.P. (2018). Australian and New Zealand Journal of Statistics, 60, 279-330 <DOI:10.1111/ANZS.12241>.
Implementation of a common set of punctual solutions for Cooperative Game Theory.
An R interface to the Galvanize Highbond API <https://docs-apis.highbond.com>.
Interface for extra smooth functions including tensor products, neural networks and decision trees.
Allows user to choose/gate a region on the plot and returns points within it.
Display a random fact about Carl Friedrich Gauss based the on collection curated by Mike Cavers via the <http://gaussfacts.com> site.
Density, distribution function, quantile function and random generation for the bimodal skew symmetric normal distribution of Hassan and El-Bassiouni (2016) <doi:10.1080/03610926.2014.882950>.
This package provides functions to fit two-dimensional Gaussian functions, predict values from fits, and produce plots of predicted data via either ggplot2 or base R plotting.
Given a vector of cluster memberships for a cell population, identifies a sequence of gates (polygon filters on 2D scatter plots) for isolation of that cell type.
GA4GHshiny package provides an easy way to interact with data servers based on Global Alliance for Genomics and Health (GA4GH) genomics API through a Shiny application. It also integrates with Beacon Network.
This package provides functions to estimate the disparities across categories (e.g. Black and white) that persists if a treatment variable (e.g. college) is equalized. Makes estimates by treatment modeling, outcome modeling, and doubly-robust augmented inverse probability weighting estimation, with standard errors calculated by a nonparametric bootstrap. Cross-fitting is supported. Survey weights are supported for point estimation but not for standard error estimation; those applying this package with complex survey samples should consult the data distributor to select an appropriate approach for standard error construction, which may involve calling the functions repeatedly for many sets of replicate weights provided by the data distributor. The methods in this package are described in Lundberg (2021) <doi:10.31235/osf.io/gx4y3>.
This is an add-on package to gamlss'. The purpose of this package is to allow users to fit GAMLSS (Generalised Additive Models for Location Scale and Shape) models when the response variable is defined either in the intervals [0,1), (0,1] and [0,1] (inflated at zero and/or one distributions), or in the positive real line including zero (zero-adjusted distributions). The mass points at zero and/or one are treated as extra parameters with the possibility to include a linear predictor for both. The package also allows transformed or truncated distributions from the GAMLSS family to be used for the continuous part of the distribution. Standard methods and GAMLSS diagnostics can be used with the resulting fitted object.
Demos for smoothing and gamlss.family distributions.
This package provides data used as examples to demonstrate GAMLSS models.
This package contains infrastructure for using mboost::gamboost()
in order to estimate multistate models.