Model fitting and species biotic interaction network topology selection for explicit interaction community models. Explicit interaction community models are an extension of binomial linear models for joint modelling of species communities, that incorporate both the effects of species biotic interactions and the effects of missing covariates. Species interactions are modelled as direct effects of each species on each of the others, and are estimated alongside the effects of missing covariates, modelled as latent factors. The package includes a penalized maximum likelihood fitting function, and a genetic algorithm for selecting the most parsimonious species interaction network topology.
Evidence of Absence software (EoA
) is a user-friendly application for estimating bird and bat fatalities at wind farms and designing search protocols. The software is particularly useful in addressing whether the number of fatalities has exceeded a given threshold and what search parameters are needed to give assurance that thresholds were not exceeded. The models are applicable even when zero carcasses have been found in searches, following Huso et al. (2015) <doi:10.1890/14-0764.1>, Dalthorp et al. (2017) <doi:10.3133/ds1055>, and Dalthorp and Huso (2015) <doi:10.3133/ofr20151227>.
Mining informative genes with certain biological meanings are important for clinical diagnosis of disease and discovery of disease mechanisms in plants and animals. This process involves identification of relevant genes and removal of redundant genes as much as possible from a whole gene set. This package selects the informative genes related to a specific trait using gene expression dataset. These trait specific genes are considered as informative genes. This package returns the informative gene set from the high dimensional gene expression data using a combination of methods SVM and MRMR (for feature selection) with bootstrapping procedure.
The sdrt()
function is designed for estimating subspaces for Sufficient Dimension Reduction (SDR) in time series, with a specific focus on the Time Series Central Mean subspace (TS-CMS). The package employs the Fourier transformation method proposed by Samadi and De Alwis (2023) <doi:10.48550/arXiv.2312.02110>
and the Nadaraya-Watson kernel smoother method proposed by Park et al. (2009) <doi:10.1198/jcgs.2009.08076> for estimating the TS-CMS. The package provides tools for estimating distances between subspaces and includes functions for selecting model parameters using the Fourier transformation method.
An easy to use implementation of routine structural missing data diagnostics with functions to visualize the proportions of missing observations, investigate missing data patterns and conduct various empirical missing data diagnostic tests. Reference: Weberpals J, Raman SR, Shaw PA, Lee H, Hammill BG, Toh S, Connolly JG, Dandreo KJ, Tian F, Liu W, Li J, Hernández-Muñoz JJ, Glynn RJ, Desai RJ. smdi: an R package to perform structural missing data investigations on partially observed confounders in real-world evidence studies. JAMIA Open. 2024 Jan 31;7(1):ooae008. <doi:10.1093/jamiaopen/ooae008>.
Allows to map species richness and endemism based on stacked species distribution models (SSDM). Individuals SDMs can be created using a single or multiple algorithms (ensemble SDMs). For each species, an SDM can yield a habitat suitability map, a binary map, a between-algorithm variance map, and can assess variable importance, algorithm accuracy, and between- algorithm correlation. Methods to stack individual SDMs include summing individual probabilities and thresholding then summing. Thresholding can be based on a specific evaluation metric or by drawing repeatedly from a Bernoulli distribution. The SSDM package also provides a user-friendly interface.
The R package CTSV implements the CTSV approach developed by Jinge Yu and Xiangyu Luo that detects cell-type-specific spatially variable genes accounting for excess zeros. CTSV directly models sparse raw count data through a zero-inflated negative binomial regression model, incorporates cell-type proportions, and performs hypothesis testing based on R package pscl. The package outputs p-values and q-values for genes in each cell type, and CTSV is scalable to datasets with tens of thousands of genes measured on hundreds of spots. CTSV can be installed in Windows, Linux, and Mac OS.
This package provides an up-to-date copy of the Internet Assigned Numbers Authority (IANA) Time Zone Database. It is updated periodically to reflect changes made by political bodies to time zone boundaries, UTC offsets, and daylight saving time rules. Additionally, this package provides a C++ interface for working with the date
library. date
provides comprehensive support for working with dates and date-times, which this package exposes to make it easier for other R packages to utilize. Headers are provided for calendar specific calculations, along with a limited interface for time zone manipulations.
This package provides interface to the Copernicus Data Space Ecosystem API <https://dataspace.copernicus.eu/analyse/apis>, mainly for searching the catalog of available data from Copernicus Sentinel missions and obtaining the images for just the area of interest based on selected spectral bands. The package uses the Sentinel Hub REST API interface <https://dataspace.copernicus.eu/analyse/apis/sentinel-hub> that provides access to various satellite imagery archives. It allows you to access raw satellite data, rendered images, statistical analysis, and other features. This package is in no way officially related to or endorsed by Copernicus.
Solves multivariate least squares (MLS) problems subject to constraints on the coefficients, e.g., non-negativity, orthogonality, equality, inequality, monotonicity, unimodality, smoothness, etc. Includes flexible functions for solving MLS problems subject to user-specified equality and/or inequality constraints, as well as a wrapper function that implements 24 common constraint options. Also does k-fold or generalized cross-validation to tune constraint options for MLS problems. See ten Berge (1993, ISBN:9789066950832) for an overview of MLS problems, and see Goldfarb and Idnani (1983) <doi:10.1007/BF02591962> for a discussion of the underlying quadratic programming algorithm.
Fit model for datasets with easy-to-interpret Gaussian process modeling, predict responses for new inputs. The input variables of the datasets can be quantitative, qualitative/categorical or mixed. The output variable of the datasets is a scalar (quantitative). The optimization of the likelihood function can be chosen by the users (see the documentation of EzGP_fit()
). The modeling method is published in "EzGP
: Easy-to-Interpret Gaussian Process Models for Computer Experiments with Both Quantitative and Qualitative Factors" by Qian Xiao, Abhyuday Mandal, C. Devon Lin, and Xinwei Deng (2022) <doi:10.1137/19M1288462>.
Perform mediation analysis in the presence of high-dimensional mediators based on the potential outcome framework. High dimensional Bayesian mediation (HDBM), developed by Song et al (2018) <doi:10.1101/467399>, relies on two Bayesian sparse linear mixed models to simultaneously analyze a relatively large number of mediators for a continuous exposure and outcome assuming a small number of mediators are truly active. This sparsity assumption also allows the extension of univariate mediator analysis by casting the identification of active mediators as a variable selection problem and applying Bayesian methods with continuous shrinkage priors on the effects.
This package provides a method to integrate molecular profiles of cancer patients (gene copy number and mRNA
abundance) to identify candidate gain of function alterations. These candidate alterations can be subsequently further tested to discover cancer driver alterations. Briefly, this method tests of genomic correlates of mRNA
dysregulation and prioritise those where DNA gains/amplifications are associated with elevated mRNA
expression of the same gene. For details see, Haider S et al. (2016) "Genomic alterations underlie a pan-cancer metabolic shift associated with tumour hypoxia", Genome Biology, <https://pubmed.ncbi.nlm.nih.gov/27358048/>.
Feature screening is a powerful tool in processing ultrahigh dimensional data. It attempts to screen out most irrelevant features in preparation for a more elaborate analysis. Xu and Chen (2014)<doi:10.1080/01621459.2013.879531> proposed an effective screening method SMLE, which naturally incorporates the joint effects among features in the screening process. This package provides an efficient implementation of SMLE-screening for high-dimensional linear, logistic, and Poisson models. The package also provides a function for conducting accurate post-screening feature selection based on an iterative hard-thresholding procedure and a user-specified selection criterion.
Structurally guided sampling (SGS) approaches for airborne laser scanning (ALS; LIDAR). Primary functions provide means to generate data-driven stratifications & methods for allocating samples. Intermediate functions for calculating and extracting important information about input covariates and samples are also included. Processing outcomes are intended to help forest and environmental management practitioners better optimize field sample placement as well as assess and augment existing sample networks in the context of data distributions and conditions. ALS data is the primary intended use case, however any rasterized remote sensing data can be used, enabling data-driven stratifications and sampling approaches.
Facilitates scalable spatiotemporally varying coefficient modelling with Bayesian kernelized tensor regression. The important features of this package are: (a) Enabling local temporal and spatial modeling of the relationship between the response variable and covariates. (b) Implementing the model described by Lei et al. (2023) <doi:10.48550/arXiv.2109.00046>
. (c) Using a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to sample from the posterior distribution of the model parameters. (d) Employing a tensor decomposition to reduce the number of estimated parameters. (e) Accelerating tensor operations and enabling graphics processing unit (GPU) acceleration with the torch package.
Different approaches to censored or truncated regression with conditional heteroscedasticity are provided. First, continuous distributions can be used for the (right and/or left censored or truncated) response with separate linear predictors for the mean and variance. Second, cumulative link models for ordinal data (obtained by interval-censoring continuous data) can be employed for heteroscedastic extended logistic regression (HXLR). In the latter type of models, the intercepts depend on the thresholds that define the intervals. Infrastructure for working with censored or truncated normal, logistic, and Student-t distributions, i.e., d/p/q/r functions and distributions3 objects.
This package provides a set of functions to conduct Conjunctive Analysis of Case Configurations (CACC) as described in Miethe, Hart, and Regoeczi (2008) <doi:10.1007/s10940-008-9044-8>, and identify and quantify situational clustering in dominant case configurations as described in Hart (2019) <doi:10.1177/0011128719866123>. Initially conceived as an exploratory technique for multivariate analysis of categorical data, CACC has developed to include formal statistical tests that can be applied in a wide variety of contexts. This technique allows examining composite profiles of different units of analysis in an alternative way to variable-oriented methods.
This package provides tools for the analysis of epidemiological and surveillance data. Contains functions for directly and indirectly adjusting measures of disease frequency, quantifying measures of association on the basis of single or multiple strata of count data presented in a contingency table, computation of confidence intervals around incidence risk and incidence rate estimates and sample size calculations for cross-sectional, case-control and cohort studies. Surveillance tools include functions to calculate an appropriate sample size for 1- and 2-stage representative freedom surveys, functions to estimate surveillance system sensitivity and functions to support scenario tree modelling analyses.
Integrates game theory and ecological theory to construct social-ecological models that simulate the management of populations and stakeholder actions. These models build off of a previously developed management strategy evaluation (MSE) framework to simulate all aspects of management: population dynamics, manager observation of populations, manager decision making, and stakeholder responses to management decisions. The newly developed generalised management strategy evaluation (GMSE) framework uses genetic algorithms to mimic the decision-making process of managers and stakeholders under conditions of change, uncertainty, and conflict. Simulations can be run using gmse()
, gmse_apply()
, and gmse_gui()
functions.
Allows users to create high-quality heatmaps from labelled, hierarchical data. Specifically, for data with a two-level hierarchical structure, it will produce a heatmap where each row and column represents a category at the lower level. These rows and columns are then grouped by the higher-level group each category belongs to, with the names for each category and groups shown in the margins. While other packages (e.g. dendextend') allow heatmap rows and columns to be arranged by groups only, hhmR
also allows the labelling of the data at both the category and group level.
Joint models have been widely used to study the associations between longitudinal biomarkers and a survival outcome. However, existing joint models only consider one or a few longitudinal biomarkers and cannot deal with high-dimensional longitudinal biomarkers. This package can be used to fit our recently developed penalized joint model that can handle high-dimensional longitudinal biomarkers. Specifically, an adaptive lasso penalty is imposed on the parameters for the effects of the longitudinal biomarkers on the survival outcome, which allows for variable selection. Also, our algorithm is computationally efficient, which is based on the Gaussian variational approximation method.
Provide estimation for particular cases of the power series cure rate model <doi:10.1080/03610918.2011.639971>. For the distribution of the concurrent causes the alternative models are the Poisson, logarithmic, negative binomial and Bernoulli (which are includes in the original work), the polylogarithm model <doi:10.1080/00949655.2018.1451850> and the Flory-Schulz <doi:10.3390/math10244643>. The estimation procedure is based on the EM algorithm discussed in <doi:10.1080/03610918.2016.1202276>. For the distribution of the time-to-event the alternative models are slash half-normal, Weibull, gamma and Birnbaum-Saunders distributions.
This package implements data processing described in <doi:10.1126/sciadv.abk3283> to align modern differentially private data with formatting of older US Census data releases. The primary goal is to read in Census Privacy Protected Microdata Files data in a reproducible way. This includes tools for aggregating to relevant levels of geography by creating geographic identifiers which match the US Census Bureau's numbering. Additionally, there are tools for grouping race numeric identifiers into categories, consistent with OMB (Office of Management and Budget) classifications. Functions exist for downloading and linking to existing sources of privacy protected microdata.