Application of genome prediction for a continuous variable, focused on genotype by environment (GE) genomic selection models (GS). It consists a group of functions that help to create regression kernels for some GE genomic models proposed by Jarquà n et al. (2014) <doi:10.1007/s00122-013-2243-1> and Lopez-Cruz et al. (2015) <doi:10.1534/g3.114.016097>. Also, it computes genomic predictions based on Bayesian approaches. The prediction function uses an orthogonal transformation of the data and specific priors present by Cuevas et al. (2014) <doi:10.1534/g3.114.013094>.
This package creates survey designs for distance sampling surveys. These designs can be assessed for various effort and coverage statistics. Once the user is satisfied with the design characteristics they can generate a set of transects to use in their distance sampling survey. Many of the designs implemented in this R package were first made available in our Distance for Windows software and are detailed in Chapter 7 of Advanced Distance Sampling, Buckland et. al. (2008, ISBN-13: 978-0199225873). Find out more about estimating animal/plant abundance with distance sampling at <https://distancesampling.org/>.
This package provides a program for Bayesian analysis of univariate normal mixtures with an unknown number of components, following the approach of Richardson and Green (1997) <doi:10.1111/1467-9868.00095>. This makes use of reversible jump Markov chain Monte Carlo methods that are capable of jumping between the parameter sub-spaces corresponding to different numbers of components in the mixture. A sample from the full joint distribution of all unknown variables is thereby generated, and this can be used as a basis for a thorough presentation of many aspects of the posterior distribution.
Supplementary utils for CRAN maintainers and R packages developers. Validating the library, packages and lock files. Exploring a complexity of a specific package like evaluating its size in bytes with all dependencies. The shiny app complexity could be explored too. Assessing the life duration of a specific package version. Checking a CRAN package check page status for any errors and warnings. Retrieving a DESCRIPTION or NAMESPACE file for any package version. Comparing DESCRIPTION or NAMESPACE files between different package versions. Getting a list of all releases for a specific package. The Bioconductor is partly supported.
This package provides a multiple testing procedure for testing several groups of hypotheses is implemented. Linear dependency among the hypotheses within the same group is modeled by using hidden Markov Models. It is noted that a smaller p value does not necessarily imply more significance due to the dependency. A typical application is to analyze genome wide association studies datasets, where SNPs from the same chromosome are treated as a group and exhibit strong linear genomic dependency. See Wei Z, Sun W, Wang K, Hakonarson H (2009) <doi:10.1093/bioinformatics/btp476> for more details.
Code to identify functional enrichments across diverse taxa in phylogenetic tree, particularly where these taxa differ in abundance across samples in a non-random pattern. The motivation for this approach is to identify microbial functions encoded by diverse taxa that are at higher abundance in certain samples compared to others, which could indicate that such functions are broadly adaptive under certain conditions. See GitHub
repository for tutorial and examples: <https://github.com/gavinmdouglas/POMS/wiki>. Citation: Gavin M. Douglas, Molly G. Hayes, Morgan G. I. Langille, Elhanan Borenstein (2022) <doi:10.1093/bioinformatics/btac655>.
This package provides a system to plan analyses within the mental model where you have one (or more) datasets and want to run either A) the same function multiple times with different arguments, or B) multiple functions. This is appropriate when you have multiple strata (e.g. locations, age groups) that you want to apply the same function to, or you have multiple variables (e.g. exposures) that you want to apply the same statistical method to, or when you are creating the output for a report and you need multiple different tables or graphs.
This R package assists breeders in linking data systems with their analytic pipelines, a crucial step in digitizing breeding processes. It supports querying and retrieving phenotypic and genotypic data from systems like EBS <https://ebs.excellenceinbreeding.org/>, BMS <https://bmspro.io>, BreedBase
<https://breedbase.org>, GIGWA <https://github.com/SouthGreenPlatform/Gigwa2>
(using BrAPI
<https://brapi.org> calls), , and Germinate <https://germinateplatform.github.io/get-germinate/>. Extra helper functions support environmental data sources, including TerraClimate
<https://www.climatologylab.org/terraclimate.html> and FAO HWSDv2 <https://gaez.fao.org/pages/hwsd> soil database.
Estimate the transition diagnostic classification model (TDCM) described in Madison & Bradshaw (2018) <doi:10.1007/s11336-018-9638-5>, a longitudinal extension of the log-linear cognitive diagnosis model (LCDM) in Henson, Templin & Willse (2009) <doi:10.1007/s11336-008-9089-5>. As the LCDM subsumes many other diagnostic classification models (DCMs), many other DCMs can be estimated longitudinally via the TDCM. The TDCM package includes functions to estimate the single-group and multigroup TDCM, summarize results of interest including item parameters, growth proportions, transition probabilities, transitional reliability, attribute correlations, model fit, and growth plots.
This package provides an up-to-date copy of the Internet Assigned Numbers Authority (IANA) Time Zone Database. It is updated periodically to reflect changes made by political bodies to time zone boundaries, UTC offsets, and daylight saving time rules. Additionally, this package provides a C++ interface for working with the date
library. date
provides comprehensive support for working with dates and date-times, which this package exposes to make it easier for other R packages to utilize. Headers are provided for calendar specific calculations, along with a limited interface for time zone manipulations.
This package provides methods to detect differential item functioning (DIF) in dichotomous and polytomous items, using both classical and modern approaches. These include Mantel-Haenszel procedures, logistic regression (including ordinal models), and regularization-based methods such as LASSO. Uniform and non-uniform DIF effects can be detected, and some methods support multiple focal groups. The package also provides tools for anchor purification, rest score matching, effect size estimation, and DIF simulation. See Magis, Beland, Tuerlinckx, and De Boeck (2010, Behavior Research Methods, 42, 847â 862, <doi:10.3758/BRM.42.3.847>) for a general overview.
Model fitting and species biotic interaction network topology selection for explicit interaction community models. Explicit interaction community models are an extension of binomial linear models for joint modelling of species communities, that incorporate both the effects of species biotic interactions and the effects of missing covariates. Species interactions are modelled as direct effects of each species on each of the others, and are estimated alongside the effects of missing covariates, modelled as latent factors. The package includes a penalized maximum likelihood fitting function, and a genetic algorithm for selecting the most parsimonious species interaction network topology.
Evidence of Absence software (EoA
) is a user-friendly application for estimating bird and bat fatalities at wind farms and designing search protocols. The software is particularly useful in addressing whether the number of fatalities has exceeded a given threshold and what search parameters are needed to give assurance that thresholds were not exceeded. The models are applicable even when zero carcasses have been found in searches, following Huso et al. (2015) <doi:10.1890/14-0764.1>, Dalthorp et al. (2017) <doi:10.3133/ds1055>, and Dalthorp and Huso (2015) <doi:10.3133/ofr20151227>.
Mining informative genes with certain biological meanings are important for clinical diagnosis of disease and discovery of disease mechanisms in plants and animals. This process involves identification of relevant genes and removal of redundant genes as much as possible from a whole gene set. This package selects the informative genes related to a specific trait using gene expression dataset. These trait specific genes are considered as informative genes. This package returns the informative gene set from the high dimensional gene expression data using a combination of methods SVM and MRMR (for feature selection) with bootstrapping procedure.
Allows to map species richness and endemism based on stacked species distribution models (SSDM). Individuals SDMs can be created using a single or multiple algorithms (ensemble SDMs). For each species, an SDM can yield a habitat suitability map, a binary map, a between-algorithm variance map, and can assess variable importance, algorithm accuracy, and between- algorithm correlation. Methods to stack individual SDMs include summing individual probabilities and thresholding then summing. Thresholding can be based on a specific evaluation metric or by drawing repeatedly from a Bernoulli distribution. The SSDM package also provides a user-friendly interface.
An easy to use implementation of routine structural missing data diagnostics with functions to visualize the proportions of missing observations, investigate missing data patterns and conduct various empirical missing data diagnostic tests. Reference: Weberpals J, Raman SR, Shaw PA, Lee H, Hammill BG, Toh S, Connolly JG, Dandreo KJ, Tian F, Liu W, Li J, Hernández-Muñoz JJ, Glynn RJ, Desai RJ. smdi: an R package to perform structural missing data investigations on partially observed confounders in real-world evidence studies. JAMIA Open. 2024 Jan 31;7(1):ooae008. <doi:10.1093/jamiaopen/ooae008>.
The sdrt()
function is designed for estimating subspaces for Sufficient Dimension Reduction (SDR) in time series, with a specific focus on the Time Series Central Mean subspace (TS-CMS). The package employs the Fourier transformation method proposed by Samadi and De Alwis (2023) <doi:10.48550/arXiv.2312.02110>
and the Nadaraya-Watson kernel smoother method proposed by Park et al. (2009) <doi:10.1198/jcgs.2009.08076> for estimating the TS-CMS. The package provides tools for estimating distances between subspaces and includes functions for selecting model parameters using the Fourier transformation method.
This package provides interface to the Copernicus Data Space Ecosystem API <https://dataspace.copernicus.eu/analyse/apis>, mainly for searching the catalog of available data from Copernicus Sentinel missions and obtaining the images for just the area of interest based on selected spectral bands. The package uses the Sentinel Hub REST API interface <https://dataspace.copernicus.eu/analyse/apis/sentinel-hub> that provides access to various satellite imagery archives. It allows you to access raw satellite data, rendered images, statistical analysis, and other features. This package is in no way officially related to or endorsed by Copernicus.
Solves multivariate least squares (MLS) problems subject to constraints on the coefficients, e.g., non-negativity, orthogonality, equality, inequality, monotonicity, unimodality, smoothness, etc. Includes flexible functions for solving MLS problems subject to user-specified equality and/or inequality constraints, as well as a wrapper function that implements 24 common constraint options. Also does k-fold or generalized cross-validation to tune constraint options for MLS problems. See ten Berge (1993, ISBN:9789066950832) for an overview of MLS problems, and see Goldfarb and Idnani (1983) <doi:10.1007/BF02591962> for a discussion of the underlying quadratic programming algorithm.
Fit model for datasets with easy-to-interpret Gaussian process modeling, predict responses for new inputs. The input variables of the datasets can be quantitative, qualitative/categorical or mixed. The output variable of the datasets is a scalar (quantitative). The optimization of the likelihood function can be chosen by the users (see the documentation of EzGP_fit()
). The modeling method is published in "EzGP
: Easy-to-Interpret Gaussian Process Models for Computer Experiments with Both Quantitative and Qualitative Factors" by Qian Xiao, Abhyuday Mandal, C. Devon Lin, and Xinwei Deng (2022) <doi:10.1137/19M1288462>.
Perform mediation analysis in the presence of high-dimensional mediators based on the potential outcome framework. High dimensional Bayesian mediation (HDBM), developed by Song et al (2018) <doi:10.1101/467399>, relies on two Bayesian sparse linear mixed models to simultaneously analyze a relatively large number of mediators for a continuous exposure and outcome assuming a small number of mediators are truly active. This sparsity assumption also allows the extension of univariate mediator analysis by casting the identification of active mediators as a variable selection problem and applying Bayesian methods with continuous shrinkage priors on the effects.
This package provides a method to integrate molecular profiles of cancer patients (gene copy number and mRNA
abundance) to identify candidate gain of function alterations. These candidate alterations can be subsequently further tested to discover cancer driver alterations. Briefly, this method tests of genomic correlates of mRNA
dysregulation and prioritise those where DNA gains/amplifications are associated with elevated mRNA
expression of the same gene. For details see, Haider S et al. (2016) "Genomic alterations underlie a pan-cancer metabolic shift associated with tumour hypoxia", Genome Biology, <https://pubmed.ncbi.nlm.nih.gov/27358048/>.
Structurally guided sampling (SGS) approaches for airborne laser scanning (ALS; LIDAR). Primary functions provide means to generate data-driven stratifications & methods for allocating samples. Intermediate functions for calculating and extracting important information about input covariates and samples are also included. Processing outcomes are intended to help forest and environmental management practitioners better optimize field sample placement as well as assess and augment existing sample networks in the context of data distributions and conditions. ALS data is the primary intended use case, however any rasterized remote sensing data can be used, enabling data-driven stratifications and sampling approaches.
Feature screening is a powerful tool in processing ultrahigh dimensional data. It attempts to screen out most irrelevant features in preparation for a more elaborate analysis. Xu and Chen (2014)<doi:10.1080/01621459.2013.879531> proposed an effective screening method SMLE, which naturally incorporates the joint effects among features in the screening process. This package provides an efficient implementation of SMLE-screening for high-dimensional linear, logistic, and Poisson models. The package also provides a function for conducting accurate post-screening feature selection based on an iterative hard-thresholding procedure and a user-specified selection criterion.