The purpose of this package is to provide a comprehensive R interface to the Decision Support System for Agrotechnology Transfer Cropping Systems Model (DSSAT-CSM; see <https://dssat.net> for more information). The package provides cross-platform functions to read and write input files, run DSSAT-CSM, and read output files.
This package provides a statistical method based on Bayesian Additive Regression Trees with Global Standard Error Permutation Test (BART-G.SE) for descriptor selection and symbolic regression. It finds the symbolic formula of the regression function y=f(x) as described in Ye, Senftle, and Li (2023) <arXiv:2110.10195>.
This package implements a computational framework to predict microbial community-based metabolic profiles with O2PLS model. It provides procedures of model training and prediction. Paired microbiome and metabolome data are needed for modeling, and the trained model can be applied to predict metabolites of analogous environments using new microbial feature abundances.
An implementation of the iterative proportional fitting (IPFP), maximum likelihood, minimum chi-square and weighted least squares procedures for updating a N-dimensional array with respect to given target marginal distributions (which, in turn can be multidimensional). The package also provides an application of the IPFP to simulate multivariate Bernoulli distributions.
This package provides functions to handle ordinal relations reflected within the feature space. Those function allow to search for ordinal relations in multi-class datasets. One can check whether proposed relations are reflected in a specific feature representation. Furthermore, it provides functions to filter, organize and further analyze those ordinal relations.
This package provides a programmatic interface to many species occurrence data sources, including Global Biodiversity Information Facility ('GBIF'), iNaturalist', eBird', Integrated Digitized Biocollections ('iDigBio'), VertNet', Ocean Biogeographic Information System ('OBIS'), and Atlas of Living Australia ('ALA'). Includes functionality for retrieving species occurrence data, and combining those data.
This package provides functions to calculate EBLUPs (Empirical Best Linear Unbiased Predictor) estimators and their MSEs (Mean Squared Errors). Estimators are based on an area-level linear mixed model introduced by Rao and Yu (1994) <doi:10.2307/3315407>. The REML (Residual Maximum Likelihood) method is used for fitting the model.
Calculating daily global solar radiation at horizontal surface using several well-known models (i.e. Angstrom-Prescott, Supit-Van Kappel, Hargreaves, Bristow and Campbell, and Mahmood-Hubbard), and model calibration based on ground-truth data, and (3) model auto-calibration. The FAO Penmann-Monteith equation to calculate evapotranspiration is also included.
The Bank of Canada updated their Valet API <https://www.bankofcanada.ca/valet/docs>, and no R client currently exists. This provides access to all of Valet's endpoints and serves responses in wide format easy for researchers to handle but also provides tools to access API responses as a list.
This package provides functions to calculate the Water Deficit Index (WDI) and the Evaporative Fraction (EF) using geospatial raster data such as fractional vegetation cover (FVC) and surface-air temperature difference (TS-TA). The package automates regression-based edge fitting and produces continuous spatial maps of surface moisture and evaporative dynamics.
The base functions for set operations in R can be used for only two sets. This package RVenn provides functions for dealing with multiple sets. It uses purr to find the union, intersection and difference of three or more sets. This package also provides functions for pairwise set operations among several sets. Further, based on ggplot2 and ggforce, a Venn diagram can be drawn for two or three sets. For bigger data sets, a clustered heatmap showing the presence or absence of the elements of the sets can be drawn based on the pheatmap package. Finally, enrichment test can be applied to two sets whether an overlap is statistically significant or not.
Placental epigenetic clock to estimate aging based on gestational age using DNA methylation levels, so called placental epigenetic clock (PlEC). We developed a PlEC for the 2024 Placental Clock DREAM Challenge (<https://www.synapse.org/Synapse:syn59520082/wiki/628063>). Our PlEC achieved the top performance based on an independent test set. PlEC can be used to identify accelerated/decelerated aging of placenta for understanding placental dysfunction-related conditions, e.g., great obstetrical syndromes including preeclampsia, fetal growth restriction, preterm labor, preterm premature rupture of the membranes, late spontaneous abortion, and placental abruption. Detailed methodologies and examples are documented in our vignette, available at <https://herdiantrisufriyana.github.io/rplec/doc/placental_aging_analysis.html>.
Traditional latent variable models assume that the population is homogeneous, meaning that all individuals in the population are assumed to have the same latent structure. However, this assumption is often violated in practice given that individuals may differ in their age, gender, socioeconomic status, and other factors that can affect their latent structure. The robust expectation maximization (REM) algorithm is a statistical method for estimating the parameters of a latent variable model in the presence of population heterogeneity as recommended by Nieser & Cochran (2023) <doi:10.1037/met0000413>. The REM algorithm is based on the expectation-maximization (EM) algorithm, but it allows for the case when all the data are generated by the assumed data generating model.
Iterative least cost path and minimum spanning tree methods for projecting forest road networks. The methods connect a set of target points to an existing road network using igraph <https://igraph.org> to identify least cost routes. The cost of constructing a road segment between adjacent pixels is determined by a user supplied weight raster and a weight function; options include the average of adjacent weight raster values, and a function of the elevation differences between adjacent cells that penalizes steep grades. These road network projection methods are intended for integration into R workflows and modelling frameworks used for forecasting forest change, and can be applied over multiple time-steps without rebuilding a graph at each time-step.
This package provides functions for reading, writing, plotting, analysing, and manipulating allelic and haplotypic data, including from VCF files, and for the analysis of population nucleotide sequences and micro-satellites including coalescent analyses, linkage disequilibrium, population structure (Fst, Amova) and equilibrium (HWE), haplotype networks, minimum spanning tree and network, and median-joining networks.
The FDA Adverse Event Reporting System (FAERS) is a database used for the spontaneous reporting of adverse events and medication errors related to human drugs and therapeutic biological products. faers pacakge serves as the interface between the FAERS database and R. Furthermore, faers pacakge offers a standardized approach for performing pharmacovigilance analysis.
Alternative polyadenylation (APA) is one of the important post- transcriptional regulation mechanisms which occurs in most human genes. InPAS facilitates the discovery of novel APA sites and the differential usage of APA sites from RNA-Seq data. It leverages cleanUpdTSeq to fine tune identified APA sites by removing false sites.
This package provides a pipeline for analysis of GC-MS data acquired in selected ion monitoring (SIM) mode. The tool also provides a guidance in choosing appropriate fragments for the targets of interest by using an optimization algorithm. This is done by considering overlapping peaks from a provided library by the user.
This package provides a toolbox for sparse contrastive principal component analysis (scPCA) of high-dimensional biological data. scPCA combines the stability and interpretability of sparse PCA with contrastive PCA's ability to disentangle biological signal from unwanted variation through the use of control data. Also implements and extends cPCA.
The tigre package implements our methodology of Gaussian process differential equation models for analysis of gene expression time series from single input motif networks. The package can be used for inferring unobserved transcription factor (TF) protein concentrations from expression measurements of known target genes, or for ranking candidate targets of a TF.
This package provides a method for the Bayesian functional linear regression model (scalar-on-function), including two estimators of the coefficient function and an estimator of its support. A representation of the posterior distribution is also available. Grollemund P-M., Abraham C., Baragatti M., Pudlo P. (2019) <doi:10.1214/18-BA1095>.
An interface to explore, analyze, and visualize droplet digital PCR (ddPCR) data in R. This is the first non-proprietary software for analyzing two-channel ddPCR data. An interactive tool was also created and is available online to facilitate this analysis for anyone who is not comfortable with using R.
Fast computation of the distance covariance dcov and distance correlation dcor'. The computation cost is only O(n log(n)) for the distance correlation (see Chaudhuri, Hu (2019) <arXiv:1810.11332> <doi:10.1016/j.csda.2019.01.016>). The functions are written entirely in C++ to speed up the computation.
Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.