Three sets of data and functions for informing ecosystem restoration decisions, particularly in the context of the U.S. Army Corps of Engineers. First, model parameters are compiled as a data set and associated metadata for over 300 habitat suitability models developed by the U.S. Fish and Wildlife Service (USFWS 1980, <https://www.fws.gov/policy-library/870fw1>). Second, functions for conducting habitat suitability analyses both for the models described above as well as generic user-specified model parameterizations. Third, a suite of decision support tools for conducting cost-effectiveness and incremental cost analyses (Robinson et al. 1995, IWR Report 95-R-1, U.S. Army Corps of Engineers).
This package creates interactive trees that can be included in Shiny apps and R markdown documents. A tree allows to represent hierarchical data (e.g. the contents of a directory). Similar to the shinyTree
package but offers more features and options, such as the grid extension, restricting the drag-and-drop behavior, and settings for the search functionality. It is possible to attach some data to the nodes of a tree and then to get these data in Shiny when a node is selected. Also provides a Shiny gadget allowing to manipulate one or more folders, and a Shiny module allowing to navigate in the server side file system.
This package provides a long-term forecast model called "Jubilee-Tectonic model" is implemented to forecast future returns of the U.S. stock market, Treasury yield, and gold price. The five-factor model forecasts the 10-year and 20-year future equity returns with high R-squared above 80 percent. It is based on linear growth and mean reversion characteristics in the U.S. stock market. This model also enhances the CAPE model by introducing the hypothesis that there are fault lines in the historical CAPE, which can be calibrated and corrected through statistical learning. In addition, it contains a module for business cycles, optimal interest rate, and recession forecasts.
This package provides stochastic EM algorithms for latent variable models with a high-dimensional latent space. So far, we provide functions for confirmatory item factor analysis based on the multidimensional two parameter logistic (M2PL) model and the generalized multidimensional partial credit model. These functions scale well for problems with many latent traits (e.g., thirty or even more) and are virtually tuning-free. The computation is facilitated by multiprocessing OpenMP
API. For more information, please refer to: Zhang, S., Chen, Y., & Liu, Y. (2018). An Improved Stochastic EM Algorithm for Large-scale Full-information Item Factor Analysis. British Journal of Mathematical and Statistical Psychology. <doi:10.1111/bmsp.12153>.
Obtain least-squares means for linear, generalized linear, and mixed models. Compute contrasts or linear functions of least-squares means, and comparisons of slopes. Plots and compact letter displays. Least-squares means were proposed in Harvey, W (1960) "Least-squares analysis of data with unequal subclass numbers", Tech Report ARS-20-8, USDA National Agricultural Library, and discussed further in Searle, Speed, and Milliken (1980) "Population marginal means in the linear model: An alternative to least squares means", The American Statistician 34(4), 216-221 <doi:10.1080/00031305.1980.10483031>. NOTE: lsmeans now relies primarily on code in the emmeans package. lsmeans will be archived in the near future.
An implementation of the sample size computation method for network models proposed by Constantin et al. (2021) <doi:10.31234/osf.io/j5v7u>. The implementation takes the form of a three-step recursive algorithm designed to find an optimal sample size given a model specification and a performance measure of interest. It starts with a Monte Carlo simulation step for computing the performance measure and a statistic at various sample sizes selected from an initial sample size range. It continues with a monotone curve-fitting step for interpolating the statistic across the entire sample size range. The final step employs stratified bootstrapping to quantify the uncertainty around the fitted curve.
The main function, plot_GMM, is used for plotting output from Gaussian mixture models (GMMs), including both densities and overlaying mixture weight component curves from the fit GMM. The package also include the function, plot_cut_point, which plots the cutpoint (mu) from the GMM over a histogram of the distribution with several color options. Finally, the package includes the function, plot_mix_comps, which is used in the plot_GMM function, and can be used to create a custom plot for overlaying mixture component curves from GMMs. For the plot_mix_comps function, usage most often will be specifying the "fun" argument within "stat_function" in a ggplot2 object.
Classification of pediatric tumors into biologically defined subtypes is challenging and multifaceted approaches are needed. For this aim, we developed a diagnostic classifier based on DNA methylation profiles. We offer MethPed
as an easy-to-use toolbox that allows researchers and clinical diagnosticians to test single samples as well as large cohorts for subclass prediction of pediatric brain tumors. The current version of MethPed
can classify the following tumor diagnoses/subgroups: Diffuse Intrinsic Pontine Glioma (DIPG), Ependymoma, Embryonal tumors with multilayered rosettes (ETMR), Glioblastoma (GBM), Medulloblastoma (MB) - Group 3 (MB_Gr3), Group 4 (MB_Gr3), Group WNT (MB_WNT), Group SHH (MB_SHH) and Pilocytic Astrocytoma (PiloAstro
).
This package provides tools to estimate tail area-based false discovery rates as well as local false discovery rates for a variety of null models (p-values, z-scores, correlation coefficients, t-scores). The proportion of null values and the parameters of the null distribution are adaptively estimated from the data. In addition, the package contains functions for non-parametric density estimation (Grenander estimator), for monotone regression (isotonic regression and antitonic regression with weights), for computing the greatest convex minorant (GCM) and the least concave majorant (LCM), for the half-normal and correlation distributions, and for computing empirical higher criticism (HC) scores and the corresponding decision threshold.
Herramientas para el análisis de datos de COVID-19 en México. Descarga y analiza los datos para COVID-19 de la Direccion General de Epidemiologà a de México (DGE) <https://www.gob.mx/salud/documentos/datos-abiertos-152127>, la Red de Infecciones Respiratorias Agudas Graves (Red IRAG) <https://www.gits.igg.unam.mx/red-irag-dashboard/reviewHome>
y la Iniciativa Global para compartir todos los datos de influenza (GISAID) <https://gisaid.org/>. English: Downloads and analyzes data of COVID-19 from the Mexican General Directorate of Epidemiology (DGE), the Network of Severe Acute Respiratory Infections (IRAG network),and the Global Initiative on Sharing All Influenza Data GISAID.
Time-varying coefficient models for interval censored and right censored survival data including 1) Bayesian Cox model with time-independent, time-varying or dynamic coefficients for right censored and interval censored data studied by Sinha et al. (1999) <doi:10.1111/j.0006-341X.1999.00585.x> and Wang et al. (2013) <doi:10.1007/s10985-013-9246-8>, 2) Spline based time-varying coefficient Cox model for right censored data proposed by Perperoglou et al. (2006) <doi:10.1016/j.cmpb.2005.11.006>, and 3) Transformation model with time-varying coefficients for right censored data using estimating equations proposed by Peng and Huang (2007) <doi:10.1093/biomet/asm058>.
Provide an optimal histogram, in the sense of probability density estimation and features detection, by means of multiscale variational inference. In other words, the resulting histogram servers as an optimal density estimator, and meanwhile recovers the features, such as increases or modes, with both false positive and false negative controls. Moreover, it provides a parsimonious representation in terms of the number of blocks, which simplifies data interpretation. The only assumption for the method is that data points are independent and identically distributed, so it applies to fairly general situations, including continuous distributions, discrete distributions, and mixtures of both. For details see Li, Munk, Sieling and Walther (2016) <arXiv:1612.07216>
.
Using mixed effects models to analyse longitudinal gene expression can highlight differences between sample groups over time. The most widely used differential gene expression tools are unable to fit linear mixed effect models, and are less optimal for analysing longitudinal data. This package provides negative binomial and Gaussian mixed effects models to fit gene expression and other biological data across repeated samples. This is particularly useful for investigating changes in RNA-Sequencing gene expression between groups of individuals over time, as described in: Rivellese, F., Surace, A. E., Goldmann, K., Sciacca, E., Cubuk, C., Giorli, G., ... Lewis, M. J., & Pitzalis, C. (2022) Nature medicine <doi:10.1038/s41591-022-01789-0>.
Performance measures and scores for statistical classification such as accuracy, sensitivity, specificity, recall, similarity coefficients, AUC, GINI index, Brier score and many more. Calculation of optimal cut-offs and decision stumps (Iba and Langley (1991), <doi:10.1016/B978-1-55860-247-2.50035-8>) for all implemented performance measures. Hosmer-Lemeshow goodness of fit tests (Lemeshow and Hosmer (1982), <doi:10.1093/oxfordjournals.aje.a113284>; Hosmer et al (1997), <doi:10.1002/(SICI)1097-0258(19970515)16:9%3C965::AID-SIM509%3E3.0.CO;2-O>). Statistical and epidemiological risk measures such as relative risk, odds ratio, number needed to treat (Porta (2014), <doi:10.1093%2Facref%2F9780199976720.001.0001>).
This package provides functions to do O2PLS-DA analysis for multiple omics data integration. The algorithm came from "O2-PLS, a two-block (X±Y) latent variable regression (LVR) method with an integral OSC filter" which published by Johan Trygg and Svante Wold at 2003 <doi:10.1002/cem.775>. O2PLS is a bidirectional multivariate regression method that aims to separate the covariance between two data sets (it was recently extended to multiple data sets) (Löfstedt and Trygg, 2011 <doi:10.1002/cem.1388>; Löfstedt et al., 2012 <doi:10.1016/j.aca.2013.06.026>) from the systematic sources of variance being specific for each data set separately.
Offers a range of utilities and functions for everyday programming tasks. 1.Data Manipulation. Such as grouping and merging, column splitting, and character expansion. 2.File Handling. Read and convert files in popular formats. 3.Plotting Assistance. Helpful utilities for generating color palettes, validating color formats, and adding transparency. 4.Statistical Analysis. Includes functions for pairwise comparisons and multiple testing corrections, enabling perform statistical analyses with ease. 5.Graph Plotting, Provides efficient tools for creating doughnut plot and multi-layered doughnut plot; Venn diagrams, including traditional Venn diagrams, upset plots, and flower plots; Simplified functions for creating stacked bar plots, or a box plot with alphabets group for multiple comparison group.
This package provides tools for translating environmental change into organismal response. Microclimate models to vertically scale weather station data to organismal heights. The biophysical modeling tools include both general models for heat flows and specific models to predict body temperatures for a variety of ectothermic taxa. Additional functions model and temporally partition air and soil temperatures and solar radiation. Utility functions estimate the organismal and environmental parameters needed for biophysical ecology. TrenchR
focuses on relatively simple and modular functions so users can create transparent and flexible biophysical models. Many functions are derived from Gates (1980) <doi:10.1007/978-1-4612-6024-0> and Campbell and Norman (1988) <isbn:9780387949376>.
Fit a logistic regression model using Firth's bias reduction method, equivalent to penalization of the log-likelihood by the Jeffreys prior. Confidence intervals for regression coefficients can be computed by penalized profile likelihood. Firth's method was proposed as ideal solution to the problem of separation in logistic regression, see Heinze and Schemper (2002) <doi:10.1002/sim.1047>. If needed, the bias reduction can be turned off such that ordinary maximum likelihood logistic regression is obtained. Two new modifications of Firth's method, FLIC and FLAC, lead to unbiased predictions and are now available in the package as well, see Puhr et al (2017) <doi:10.1002/sim.7273>.
Companion R package for the course "Statistical analysis of correlated and repeated measurements for health science researchers" taught by the section of Biostatistics of the University of Copenhagen. It implements linear mixed models where the model for the variance-covariance of the residuals is specified via patterns (compound symmetry, toeplitz, unstructured, ...). Statistical inference for mean, variance, and correlation parameters is performed based on the observed information and a Satterthwaite approximation of the degrees of freedom. Normalized residuals are provided to assess model misspecification. Statistical inference can be performed for arbitrary linear or non-linear combination(s) of model coefficients. Predictions can be computed conditional to covariates only or also to outcome values.
Transfer learning, as a prevailing technique in computer sciences, aims to improve the performance of a target model by leveraging auxiliary information from heterogeneous source data. We provide novel tools for multi-source transfer learning under statistical models based on model averaging strategies, including linear regression models, partially linear models. Unlike existing transfer learning approaches, this method integrates the auxiliary information through data-driven weight assignments to avoid negative transfer. This is the first package for transfer learning based on the optimal model averaging frameworks, providing efficient implementations for practitioners in multi-source data modeling. The details are described in Hu and Zhang (2023) <https://jmlr.org/papers/v24/23-0030.html>.
Machine learning method specifically designed for pre-miRNA
prediction. It takes advantage of unlabeled sequences to improve the prediction rates even when there are just a few positive examples, when the negative examples are unreliable or are not good representatives of its class. Furthermore, the method can automatically search for negative examples if the user is unable to provide them. MiRNAss
can find a good boundary to divide the pre-miRNAs
from other groups of sequences; it automatically optimizes the threshold that defines the classes boundaries, and thus, it is robust to high class imbalance. Each step of the method is scalable and can handle large volumes of data.
This package provides a function for estimating the transition probabilities in an illness-death model. The transition probabilities can be estimated from the unsmoothed landmark estimators developed by de Una-Alvarez and Meira-Machado (2015) <doi:10.1111/biom.12288>. Presmoothed estimates can also be obtained through the use of a parametric family of binary regression curves, such as logit, probit or cauchit. The additive logistic regression model and nonparametric regression are also alternatives which have been implemented. The idea behind the presmoothed landmark estimators is to use the presmoothing techniques developed by Cao et al. (2005) <doi:10.1007/s00180-007-0076-6> in the landmark estimation of the transition probabilities.
An introduction to a couple of novel predictive variable selection methods for generalised boosted regression modeling (gbm). They are based on various variable influence methods (i.e., relative variable influence (RVI) and knowledge informed RVI (i.e., KIRVI, and KIRVI2)) that adopted similar ideas as AVI, KIAVI and KIAVI2 in the steprf package, and also based on predictive accuracy in stepwise algorithms. For details of the variable selection methods, please see: Li, J., Siwabessy, J., Huang, Z. and Nichol, S. (2019) <doi:10.3390/geosciences9040180>. Li, J., Alvarez, B., Siwabessy, J., Tran, M., Huang, Z., Przeslawski, R., Radke, L., Howard, F., Nichol, S. (2017). <DOI: 10.13140/RG.2.2.27686.22085>.
Computation of stopping boundaries for a single-arm trial using a Bayesian criterion; i.e., for each m<=n (n= total patient number of the trial) the smallest number of observed toxicities is calculated leading to the termination of the trial/accrual according to the specified criteria. The probabilities of stopping the trial/accrual at and up until (resp.) the m-th patient (m<=n) is also calculated. This design is more conservative than the frequentist approach (using Clopper Pearson CIs) which might be preferred as it concerns safety.See also Aamot et.al.(2010) "Continuous monitoring of toxicity in clinical trials - simulating the risk of stopping prematurely" <doi:10.5414/cpp48476>.