The Global Biodiversity Information Facility ('GBIF', <https://www.gbif.org>) sources data from an international network of data providers, known as nodes'. Several of these nodes - the "living atlases" (<https://living-atlases.gbif.org>) - maintain their own web services using software originally developed by the Atlas of Living Australia ('ALA', <https://www.ala.org.au>). galah enables the R community to directly access data and resources hosted by GBIF and its partner nodes.
Fits linear regression, logistic and multinomial regression models, Poisson regression, Cox model via Global Adaptive Generative Adjustment Algorithm. For more detailed information, see Bin Wang, Xiaofei Wang and Jianhua Guo (2022) <arXiv:1911.00658>
. This paper provides the theoretical properties of Gaga linear model when the load matrix is orthogonal. Further study is going on for the nonorthogonal cases and generalized linear models. These works are in part supported by the National Natural Foundation of China (No.12171076).
An interface for fitting generalized additive models (GAMs) and generalized additive mixed models (GAMMs) using the lme4 package as the computational engine, as described in Helwig (2024) <doi:10.3390/stats7010003>. Supports default and formula methods for model specification, additive and tensor product splines for capturing nonlinear effects, and automatic determination of spline type based on the class of each predictor. Includes an S3 plot method for visualizing the (nonlinear) model terms, an S3 predict method for forming predictions from a fit model, and an S3 summary method for conducting significance testing using the Bayesian interpretation of a smoothing spline.
The accurate annotation of genes and Quantitative Trait Loci (QTLs) located within candidate markers and/or regions (haplotypes, windows, CNVs, etc) is a crucial step the most common genomic analyses performed in livestock, such as Genome-Wide Association Studies or transcriptomics. The Genomic Annotation in Livestock for positional candidate LOci (GALLO) is an R package designed to provide an intuitive and straightforward environment to annotate positional candidate genes and QTLs from high-throughput genetic studies in livestock. Moreover, GALLO allows the graphical visualization of gene and QTL annotation results, data comparison among different grouping factors (e.g., methods, breeds, tissues, statistical models, studies, etc.), and QTL enrichment in different livestock species including cattle, pigs, sheep, and chicken, among others.
Includes the ga.lts()
function that estimates LTS (Least Trimmed Squares) parameters using genetic algorithms and C-steps. ga.lts()
constructs a genetic algorithm to form a basic subset and iterates C-steps as defined in Rousseeuw and van-Driessen (2006) to calculate the cost value of the LTS criterion. OLS (Ordinary Least Squares) regression is known to be sensitive to outliers. A single outlying observation can change the values of estimated parameters. LTS is a resistant estimator even the number of outliers is up to half of the data. This package is for estimating the LTS parameters with lower bias and variance in a reasonable time. Version >=1.3 includes the function medmad for fast outlier detection in linear regression.
Estimates statistically significant marker combination values within which one immunologically distinctive group (i.e., disease case) is more associated than another group (i.e., healthy control), successively, using various combinations (i.e., "gates") of markers to examine features of cells that may be different between groups. For a two-group comparison, the gateR
package uses the spatial relative risk function estimated using the sparr package. Details about the sparr package methods can be found in the tutorial: Davies et al. (2018) <doi:10.1002/sim.7577>. Details about kernel density estimation can be found in J. F. Bithell (1990) <doi:10.1002/sim.4780090616>. More information about relative risk functions using kernel density estimation can be found in J. F. Bithell (1991) <doi:10.1002/sim.4780101112>.
The genetic algorithm can be used directly to find the similarity of users and more effectively to increase the efficiency of the collaborative filtering method. By identifying the nearest neighbors to the active user, before the genetic algorithm, and by identifying suitable starting points, an effective method for user-based collaborative filtering method has been developed. This package uses an optimization algorithm (continuous genetic algorithm) to directly find the optimal similarities between active users (users for whom current recommendations are made) and others. First, by determining the nearest neighbor and their number, the number of genes in a chromosome is determined. Each gene represents the neighbor's similarity to the active user. By estimating the starting points of the genetic algorithm, it quickly converges to the optimal solutions. The positive point is the independence of the genetic algorithm on the number of data that for big data is an effective help in solving the problem.
Robust regression via gamma-divergence with L1, elastic net and ridge.
Streamlines downloading and cleaning biodiversity data from Integrated Digitized Biocollections (iDigBio
) and the Global Biodiversity Information Facility (GBIF).
Data sets and scripts used in the book Generalized Additive Models: An Introduction with R', Wood (2006,2017) CRC.
This package provides utilities for working with Google APIs. This includes functions and classes for handling common credential types and for preparing, executing, and processing HTTP requests.
This package provides an effective machine learning-based tool that quantifies the gain of passive device installation on wind turbine generators. H. Hwangbo, Y. Ding, and D. Cabezon (2019) <arXiv:1906.05776>
.
This is a package for the manipulation of genetic data (SNPs). Computation of genetic relationship matrix (GRM) and dominance matrix, linkage disequilibrium (LD), and heritability with efficient algorithms for linear mixed models (AIREML).
The gap encodes the distance between clusters and improves interpretation of cluster heatmaps. The gaps can be of the same distance based on a height threshold to cut the dendrogram. Another option is to vary the size of gaps based on the distance between clusters.
Conducts hierarchical partitioning to calculate individual contributions of each predictor towards adjusted R2 and explained deviance for generalized additive models based on output of gam()
in mgcv package, applying the algorithm in this paper: Lai(2024) <doi:10.1016/j.pld.2024.06.002>.
Gene and Region Counting of Mutations (GARCOM) package computes mutation (or alleles) counts per gene per individuals based on gene annotation or genomic base pair boundaries. It comes with features to accept data formats in plink(.raw) and VCF. It provides users flexibility to extract and filter individuals, mutations and genes of interest.
This package provides functions for fitting the generalized additive models for location scale and shape introduced by Rigby and Stasinopoulos (2005), doi:10.1111/j.1467-9876.2005.00510.x. The models use a distributional regression approach where all the parameters of the conditional distribution of the response variable are modelled using explanatory variables.
Using overlap grouped-lasso penalties, gamsel selects whether a term in a gam is nonzero, linear, or a non-linear spline (up to a specified max df per variable). It fits the entire regularization path on a grid of values for the overall penalty lambda, both for gaussian and binomial families. See <doi:10.48550/arXiv.1506.03850>
for more details.
Fits unimodal and multimodal gambin distributions to species-abundance distributions from ecological data, as in in Matthews et al. (2014) <DOI:10.1111/ecog.00861>. gambin is short for gamma-binomial'. The main function is fit_abundances()
, which estimates the alpha parameter(s) of the gambin distribution using maximum likelihood. Functions are also provided to generate the gambin distribution and for calculating likelihood statistics.
R function gawdis()
produces multi-trait dissimilarity with more uniform contributions of different traits. de Bello et al. (2021) <doi:10.1111/2041-210X.13537> presented the approach based on minimizing the differences in the correlation between the dissimilarity of each trait, or groups of traits, and the multi-trait dissimilarity. This is done using either an analytic or a numerical solution, both available in the function.
Introduces a Copilot'-like completion experience, but it knows how to talk to the objects in your R environment. ellmer chats are integrated directly into your RStudio and Positron sessions, automatically incorporating relevant context from surrounding lines of code and your global environment (like data frame columns and types). Open the package dialog box with a keyboard shortcut, type your request, and the assistant will stream its response directly into your documents.
Fits a Gaussian process model to data. Gaussian processes are commonly used in computer experiments to fit an interpolating model. The model is stored as an R6 object and can be easily updated with new data. There are options to run in parallel, and Rcpp has been used to speed up calculations. For more info about Gaussian process software, see Erickson et al. (2018) <doi:10.1016/j.ejor.2017.10.002>.
Organize a so-called ragged array as generalized arrays, which is simply an array with sub-dimensions denoting the subdivision of dimensions (grouping of members within dimensions). By the margins (names of dimensions and sub-dimensions) in generalized arrays, operators and utility functions provided in this package automatically match the margins, doing map-reduce style parallel computation along margins. Generalized arrays are also cooperative to R's native functions that work on simple arrays.
It gathers information, meta-data and scripts in a two-part Henry-Stewart talk by Zhao (2009, <doi:10.69645/DCRY5578>), which showcases analysis in aspects such as testing of polymorphic variant(s) for Hardy-Weinberg equilibrium, association with trait using genetic and statistical models as well as Bayesian implementation, power calculation in study design and genetic annotation. It also covers R integration with the Linux environment, GitHub
, package creation and web applications.