The Global Biodiversity Information Facility ('GBIF', <https://www.gbif.org>) sources data from an international network of data providers, known as nodes'. Several of these nodes - the "living atlases" (<https://living-atlases.gbif.org>) - maintain their own web services using software originally developed by the Atlas of Living Australia ('ALA', <https://www.ala.org.au>). galah enables the R community to directly access data and resources hosted by GBIF and its partner nodes.
This package provides a set of functions to estimate haziness of an image based on RGB bands. It returns a haze factor, varying from 0 to 1, a metric for fogginess and cloudiness. The package also presents additional functions to estimate brightness, darkness and contrast rasters of the RGB image. This package can be used for several applications such as inference of weather quality data and performing environmental studies from interpreting digital images.
Advanced methods for a valuable quantitative environmental risk assessment using Bayesian inference of survival and reproduction Data. Among others, it facilitates Bayesian inference of the general unified threshold model of survival (GUTS). See our companion paper Baudrot and Charles (2021) <doi:10.21105/joss.03200>, as well as complementary details in Baudrot et al. (2018) <doi:10.1021/acs.est.7b05464> and Delignette-Muller et al. (2017) <doi:10.1021/acs.est.6b05326>.
This is an EM algorithm based method for imputation of missing values in multivariate normal time series. The imputation algorithm accounts for both spatial and temporal correlation structures. Temporal patterns can be modeled using an ARIMA(p,d,q), optionally with seasonal components, a non-parametric cubic spline or generalized additive models with exogenous covariates. This algorithm is specially tailored for climate data with missing measurements from several monitors along a given region.
This package implements the procedure from G. J. Ross (2021) - "Nonparametric Detection of Multiple Location-Scale Change Points via Wild Binary Segmentation" <arxiv:2107.01742>. This uses a version of Wild Binary Segmentation to detect multiple location-scale (i.e. mean and/or variance) change points in a sequence of univariate observations, with a strict control on the probability of incorrectly detecting a change point in a sequence which does not contain any.
An estimation procedure for the analysis of nonparametric proportional hazards model (e.g. h(t) = h0(t)exp(b(t)'Z)), providing estimation of b(t) and its pointwise standard errors, and semiparametric proportional hazards model (e.g. h(t) = h0(t)exp(b(t)'Z1 + c*Z2)), providing estimation of b(t), c and their standard errors. More details can be found in Lu Tian et al. (2005) <doi:10.1198/016214504000000845>.
Implementation of a likelihood ratio test of differential onset of senescence between two groups. Given two groups with measures of age and of an individual trait likely to be subjected to senescence (e.g. body mass), OnAge
provides an asymptotic p-value for the null hypothesis that senescence starts at the same age in both groups. The package implements the procedure used in Douhard et al. (2017) <doi:10.1111/oik.04421>.
Estimate the size of a networked population based on respondent-driven sampling data. The package is part of the "RDS Analyst" suite of packages for the analysis of respondent-driven sampling data. See Handcock, Gile and Mar (2014) <doi:10.1214/14-EJS923>, Handcock, Gile and Mar (2015) <doi:10.1111/biom.12255>, Kim and Handcock (2021) <doi:10.1093/jssam/smz055>, and McLaughlin
, et. al. (2023) <doi:10.1214/23-AOAS1807>.
Structural equation modeling is a powerful statistical approach for the testing of networks of direct and indirect theoretical causal relationships in complex data sets with inter-correlated dependent and independent variables. Here we implement a simple method for spatially explicit structural equation modeling based on the analysis of variance co-variance matrices calculated across a range of lag distances. This method provides readily interpreted plots of the change in path coefficients across scale.
The Time-Delay Correlation algorithm (TDCor) reconstructs the topology of a gene regulatory network (GRN) from time-series transcriptomic data. The algorithm is described in details in Lavenus et al., Plant Cell, 2015. It was initially developed to infer the topology of the GRN controlling lateral root formation in Arabidopsis thaliana. The time-series transcriptomic dataset which was used in this study is included in the package to illustrate how to use it.
This package provides functions to combine data.frames in ways that require additional effort in base R, and to add metadata (id, title, ...) that can be used for printing and xlsx export. The Tatoo_report class is provided as a convenient helper to write several such tables to a workbook, one table per worksheet. Tatoo is built on top of openxlsx', but intimate knowledge of that package is not required to use tatoo.
This Rcpp-based package implements a highly efficient data structure and algorithm for performing alignment of short reads from CRISPR or shRNA
screens to reference barcode library. Sequencing error are considered and matching qualities are evaluated based on Phred scores. A Bayes classifier is employed to predict the originating barcode of a read. The package supports provision of user-defined probability models for evaluating matching qualities. The package also supports multi-threading.
PhILR
is short for Phylogenetic Isometric Log-Ratio Transform. This package provides functions for the analysis of compositional data (e.g., data representing proportions of different variables/parts). Specifically this package allows analysis of compositional data where the parts can be related through a phylogenetic tree (as is common in microbiota survey data) and makes available the Isometric Log Ratio transform built from the phylogenetic tree and utilizing a weighted reference measure.
This package provides an implementation of the ACME estimator, described in Wolpert (2015), ACME: A Partially Periodic Estimator of Avian & Chiropteran Mortality at Wind Turbines. Unlike most other models, this estimator supports decreasing-hazard Weibull model for persistence; decreasing search proficiency as carcasses age; variable bleed-through at successive searches; and interval mortality estimates. The package provides, based on search data, functions for estimating the mortality inflation factor in Frequentist and Bayesian settings.
This package provides an R wrapper for libnabo, an exact or approximate k nearest neighbour library which is optimised for low dimensional spaces (e.g. 3D). nabor
includes a knn
function that is designed as a drop-in replacement for the RANN function nn2
. In addition, objects which include the k-d tree search structure can be returned to speed up repeated queries of the same set of target points.
R comes with a suite of utilities for linear algebra with "numeric" (double precision) vectors/matrices. However, sometimes single precision (or less!) is more than enough for a particular task. This package extends R's linear algebra facilities to include 32-bit float (single precision) data. Float vectors/matrices have half the precision of their "numeric"-type counterparts but are generally faster to numerically operate on, for a performance vs accuracy trade-off.
Analyzes autocorrelation and partial autocorrelation using surrogate methods and bootstrapping, and computes the acceleration constants for the vectorized moving block bootstrap provided by this package. It generates percentile, bias-corrected, and accelerated intervals and estimates partial autocorrelations using Durbin-Levinson. This package calculates the autocorrelation power spectrum, computes cross-correlations between two time series, computes bandwidth for any time series, and performs autocorrelation frequency analysis. It also calculates the periodicity of a time series.
Provide a tool to easily build customized data flows to pre-process large volumes of information from different sources. To this end, bdpar allows to (i) easily use and create new functionalities and (ii) develop new data source extractors according to the user needs. Additionally, the package provides by default a predefined data flow to extract and pre-process the most relevant information (tokens, dates, ... ) from some textual sources (SMS, Email, YouTube
comments).
An implementation of the k-means-- algorithm proposed by Chawla and Gionis, 2013 in their paper, "k-means-- : A unified approach to clustering and outlier detection. SIAM International Conference on Data Mining (SDM13)", <doi:10.1137/1.9781611972832.21> and using ordering described by Howe, 2013 in the thesis, Clustering and anomaly detection in tropical cyclones". Useful for creating (potentially) tighter clusters than standard k-means and simultaneously finding outliers inexpensively in multidimensional space.
Algorithms for solving various Maximum Weight Connected Subgraph Problems, including variants with budget constraints, cardinality constraints, weighted edges and signals. The package represents an R interface to high-efficient solvers based on relax-and-cut approach (Ã lvarez-Miranda E., Sinnl M. (2017) <doi:10.1016/j.cor.2017.05.015>) mixed-integer programming (Loboda A., Artyomov M., and Sergushichev A. (2016) <doi:10.1007/978-3-319-43681-4_17>) and simulated annealing.
An adaptation of Non-dominated Sorting Genetic Algorithm III for multi objective feature selection tasks. Non-dominated Sorting Genetic Algorithm III is a genetic algorithm that solves multiple optimization problems simultaneously by applying a non-dominated sorting technique. It uses a reference points based selection operator to explore solution space and preserve diversity. See the original paper by K. Deb and H. Jain (2014) <DOI:10.1109/TEVC.2013.2281534> for a detailed description.
Calculating Pst values to assess differentiation among populations from a set of quantitative traits is the primary purpose of such a package. The bootstrap method provides confidence intervals and distribution histograms of Pst. Variations of Pst in function of the parameter c/h^2 are studied as well. Finally, the package proposes different transformations especially to eliminate any variation resulting from allometric growth (calculation of residuals from linear regressions, Reist standardizations or Aitchison transformation).
Do Markov chain Monte Carlo (MCMC) simulation of Potts models (Potts, 1952, <doi:10.1017/S0305004100027419>), which are the multi-color generalization of Ising models (so, as as special case, also simulates Ising models). Use the Swendsen-Wang algorithm (Swendsen and Wang, 1987, <doi:10.1103/PhysRevLett.58.86>
) so MCMC is fast. Do maximum composite likelihood estimation of parameters (Besag, 1975, <doi:10.2307/2987782>, Lindsay, 1988, <doi:10.1090/conm/080>).
This package contains functions that fit linear mixed-effects models for high-dimensional data (p>>n) with penalty for both the fixed effects and random effects for variable selection. The details of the algorithm can be found in Luoying Yang PhD
thesis (Yang and Wu 2020). The algorithm implementation is based on the R package lmmlasso'. Reference: Yang L, Wu TT (2020). Model-Based Clustering of Longitudinal Data in High-Dimensionality. Unpublished thesis.