Buckley-James regression for right-censoring survival data with high-dimensional covariates. Implementations for survival data include boosting with componentwise linear least squares, componentwise smoothing splines, regression trees and MARS. Other high-dimensional tools include penalized regression for survival data. See Wang and Wang (2010) <doi:10.2202/1544-6115.1550>.
This package provides a robust constrained L1 minimization method for estimating a large sparse inverse covariance matrix (aka precision matrix), and recovering its support for building graphical models. The computation uses linear programming. The method was published in TT Cai, W Liu, X Luo (2011) <doi:10.1198/jasa.2011.tm10155>.
Contrast analysis for factorial designs provides an alternative to the traditional ANOVA approach, offering the distinct advantage of testing targeted hypotheses. The foundation of this package is primarily rooted in the works of Rosenthal, Rosnow, and Rubin (2000, ISBN: 978-0521659802) as well as Sedlmeier and Renkewitz (2018, ISBN: 978-3868943214).
This package implements the general template for collaborative targeted maximum likelihood estimation. It also provides several commonly used C-TMLE instantiation, like the vanilla/scalable variable-selection C-TMLE (Ju et al. (2017) <doi:10.1177/0962280217729845>) and the glmnet-C-TMLE algorithm (Ju et al. (2017) <arXiv:1706.10029>).
The purpose of this package is to provide a comprehensive R interface to the Decision Support System for Agrotechnology Transfer Cropping Systems Model (DSSAT-CSM; see <https://dssat.net> for more information). The package provides cross-platform functions to read and write input files, run DSSAT-CSM, and read output files.
Mechanistically models/predicts the phenology (macro-phases) of 10 crop plants (trained on a big dataset over 80 years derived from the German weather service (DWD) <https://opendata.dwd.de/>). Can be applied for remote sensing purposes, dynamically check the best subset of available covariates for the given dataset and crop.
This package provides a statistical method based on Bayesian Additive Regression Trees with Global Standard Error Permutation Test (BART-G.SE) for descriptor selection and symbolic regression. It finds the symbolic formula of the regression function y=f(x) as described in Ye, Senftle, and Li (2023) <arXiv:2110.10195>.
This package implements a computational framework to predict microbial community-based metabolic profiles with O2PLS model. It provides procedures of model training and prediction. Paired microbiome and metabolome data are needed for modeling, and the trained model can be applied to predict metabolites of analogous environments using new microbial feature abundances.
An implementation of the iterative proportional fitting (IPFP), maximum likelihood, minimum chi-square and weighted least squares procedures for updating a N-dimensional array with respect to given target marginal distributions (which, in turn can be multidimensional). The package also provides an application of the IPFP to simulate multivariate Bernoulli distributions.
This package provides functions to handle ordinal relations reflected within the feature space. Those function allow to search for ordinal relations in multi-class datasets. One can check whether proposed relations are reflected in a specific feature representation. Furthermore, it provides functions to filter, organize and further analyze those ordinal relations.
This package provides a programmatic interface to many species occurrence data sources, including Global Biodiversity Information Facility ('GBIF'), iNaturalist', eBird', Integrated Digitized Biocollections ('iDigBio'), VertNet', Ocean Biogeographic Information System ('OBIS'), and Atlas of Living Australia ('ALA'). Includes functionality for retrieving species occurrence data, and combining those data.
Calculating daily global solar radiation at horizontal surface using several well-known models (i.e. Angstrom-Prescott, Supit-Van Kappel, Hargreaves, Bristow and Campbell, and Mahmood-Hubbard), and model calibration based on ground-truth data, and (3) model auto-calibration. The FAO Penmann-Monteith equation to calculate evapotranspiration is also included.
This package provides functions to calculate EBLUPs (Empirical Best Linear Unbiased Predictor) estimators and their MSEs (Mean Squared Errors). Estimators are based on an area-level linear mixed model introduced by Rao and Yu (1994) <doi:10.2307/3315407>. The REML (Residual Maximum Likelihood) method is used for fitting the model.
The Bank of Canada updated their Valet API <https://www.bankofcanada.ca/valet/docs>, and no R client currently exists. This provides access to all of Valet's endpoints and serves responses in wide format easy for researchers to handle but also provides tools to access API responses as a list.
This package provides functions to calculate the Water Deficit Index (WDI) and the Evaporative Fraction (EF) using geospatial raster data such as fractional vegetation cover (FVC) and surface-air temperature difference (TS-TA). The package automates regression-based edge fitting and produces continuous spatial maps of surface moisture and evaporative dynamics.
The package DAPAR is a Bioconductor distributed R package which provides all the necessary functions to analyze quantitative data from label-free proteomics experiments. Contrarily to most other similar R packages, it is endowed with rich and user-friendly graphical interfaces, so that no programming skill is required (see `Prostar` package).
The Power Law Global Error Model (PLGEM) has been shown to faithfully model the variance-versus-mean dependence that exists in a variety of genome-wide datasets, including microarray and proteomics data. The use of PLGEM has been shown to improve the detection of differentially expressed genes or proteins in these datasets.
The main function of this package is beep(), with the purpose to make it easy to play notification sounds on whatever platform you are on. It is intended to be useful, for example, if you are running a long analysis in the background and want to know when it is ready.
Placental epigenetic clock to estimate aging based on gestational age using DNA methylation levels, so called placental epigenetic clock (PlEC). We developed a PlEC for the 2024 Placental Clock DREAM Challenge (<https://www.synapse.org/Synapse:syn59520082/wiki/628063>). Our PlEC achieved the top performance based on an independent test set. PlEC can be used to identify accelerated/decelerated aging of placenta for understanding placental dysfunction-related conditions, e.g., great obstetrical syndromes including preeclampsia, fetal growth restriction, preterm labor, preterm premature rupture of the membranes, late spontaneous abortion, and placental abruption. Detailed methodologies and examples are documented in our vignette, available at <https://herdiantrisufriyana.github.io/rplec/doc/placental_aging_analysis.html>.
Traditional latent variable models assume that the population is homogeneous, meaning that all individuals in the population are assumed to have the same latent structure. However, this assumption is often violated in practice given that individuals may differ in their age, gender, socioeconomic status, and other factors that can affect their latent structure. The robust expectation maximization (REM) algorithm is a statistical method for estimating the parameters of a latent variable model in the presence of population heterogeneity as recommended by Nieser & Cochran (2023) <doi:10.1037/met0000413>. The REM algorithm is based on the expectation-maximization (EM) algorithm, but it allows for the case when all the data are generated by the assumed data generating model.
The base functions for set operations in R can be used for only two sets. This package RVenn provides functions for dealing with multiple sets. It uses purr to find the union, intersection and difference of three or more sets. This package also provides functions for pairwise set operations among several sets. Further, based on ggplot2 and ggforce, a Venn diagram can be drawn for two or three sets. For bigger data sets, a clustered heatmap showing the presence or absence of the elements of the sets can be drawn based on the pheatmap package. Finally, enrichment test can be applied to two sets whether an overlap is statistically significant or not.
Iterative least cost path and minimum spanning tree methods for projecting forest road networks. The methods connect a set of target points to an existing road network using igraph <https://igraph.org> to identify least cost routes. The cost of constructing a road segment between adjacent pixels is determined by a user supplied weight raster and a weight function; options include the average of adjacent weight raster values, and a function of the elevation differences between adjacent cells that penalizes steep grades. These road network projection methods are intended for integration into R workflows and modelling frameworks used for forecasting forest change, and can be applied over multiple time-steps without rebuilding a graph at each time-step.
This package provides a method for the Bayesian functional linear regression model (scalar-on-function), including two estimators of the coefficient function and an estimator of its support. A representation of the posterior distribution is also available. Grollemund P-M., Abraham C., Baragatti M., Pudlo P. (2019) <doi:10.1214/18-BA1095>.
An interface to explore, analyze, and visualize droplet digital PCR (ddPCR) data in R. This is the first non-proprietary software for analyzing two-channel ddPCR data. An interactive tool was also created and is available online to facilitate this analysis for anyone who is not comfortable with using R.