Visualization of decision rules for binary classification and Receiver Operating Characteristic (ROC) curve estimation under different generalizations proposed in the literature: - making the classification subsets flexible to cover those scenarios where both extremes of the marker are associated with a higher risk of being positive, considering two thresholds (gROC()
function); - transforming the marker by a proper function trying to improve the classification performance (hROC()
function); - when dealing with multivariate markers, considering a proper transformation to univariate space trying to maximize the resulting AUC of the TPR for each FPR (multiROC()
function). The classification regions behind each point of the ROC curve are displayed in both static graphics (plot_buildROC()
, plot_regions()
or plot_funregions()
function) or videos (movieROC()
function).
Collision Risk Models for avian fauna (seabird and migratory birds) at offshore wind farms. The base deterministic model is derived from Band (2012) <https://tethys.pnnl.gov/publications/using-collision-risk-model-assess-bird-collision-risks-offshore-wind-farms>. This was further expanded on by Masden (2015) <doi:10.7489/1659-1> and code used here is heavily derived from this work with input from Dr A. Cook at the British Trust for Ornithology. These collision risk models are useful for marine ornithologists who are working in the offshore wind industry, particularly in UK waters. However, many of the species included in the stochastic collision risk models can also be found in the North Atlantic in the United States and Canada, and could be applied there.
This package implements functions that calculate upper prediction bounds on the false discovery proportion (FDP) in the list of discoveries returned by competition-based setups, implementing Ebadi et al. (2022) <arXiv:2302.11837>
. Such setups include target-decoy competition (TDC) in computational mass spectrometry and the knockoff construction in linear regression (note this package typically uses the terminology of TDC). Included is the standardized (TDC-SB) and uniform (TDC-UB) bound on TDC's FDP, and the simultaneous standardized and uniform bands. Requires pre-computed Monte Carlo statistics available at <https://github.com/uni-Arya/fdpbandsdata>. This data can be downloaded by running the command devtools::install_github("uni-Arya/fdpbandsdata") in R and restarting R after installation. The size of this data is roughly 81Mb.
Hybrid model is the most promising forecasting method by combining decomposition and deep learning techniques to improve the accuracy of time series forecasting. Each decomposition technique decomposes a time series into a set of intrinsic mode functions (IMFs), and the obtained IMFs are modelled and forecasted separately using the deep learning models. Finally, the forecasts of all IMFs are combined to provide an ensemble output for the time series. The prediction ability of the developed models are calculated using international monthly price series of maize in terms of evaluation criteria like root mean squared error, mean absolute percentage error and, mean absolute error. For method details see Choudhary, K. et al. (2023). <https://ssca.org.in/media/14_SA44052022_R3_SA_21032023_Girish_Jha_FINAL_Finally.pdf>.
Generate reports that enable quick visual review of temporal shifts in record-level data. Time series plots showing aggregated values are automatically created for each data field (column) depending on its contents (e.g. min/max/mean values for numeric data, no. of distinct values for categorical data), as well as overviews for missing values, non-conformant values, and duplicated rows. The resulting reports are shareable and can contribute to forming a transparent record of the entire analysis process. It is designed with Electronic Health Records in mind, but can be used for any type of record-level temporal data (i.e. tabular data where each row represents a single "event", one column contains the "event date", and other columns contain any associated values for the event).
Identification of causal effects from arbitrary observational and experimental probability distributions via do-calculus and standard probability manipulations using a search-based algorithm by Tikka, Hyttinen and Karvanen (2021) <doi:10.18637/jss.v099.i05>. Allows for the presence of mechanisms related to selection bias (Bareinboim and Tian, 2015) <doi:10.1609/aaai.v29i1.9679>, transportability (Bareinboim and Pearl, 2014) <http://ftp.cs.ucla.edu/pub/stat_ser/r443.pdf>, missing data (Mohan, Pearl, and Tian, 2013) <http://ftp.cs.ucla.edu/pub/stat_ser/r410.pdf>) and arbitrary combinations of these. Also supports identification in the presence of context-specific independence (CSI) relations through labeled directed acyclic graphs (LDAG). For details on CSIs see (Corander et al., 2019) <doi:10.1016/j.apal.2019.04.004>.
Estimation of DIFferential COexpressed NETworks using diverse and user metrics. This package is basically used for three functions related to the estimation of differential coexpression. First, to estimate differential coexpression where the coexpression is estimated, by default, by Spearman correlation. For this, a metric to compare two correlation distributions is needed. The package includes 6 metrics. Some of them needs a threshold. A new metric can also be specified as a user function with specific parameters (see difconet.run). The significance is be estimated by permutations. Second, to generate datasets with controlled differential correlation data. This is done by either adding noise, or adding specific correlation structure. Third, to show the results of differential correlation analyses. Please see <http://bioinformatica.mty.itesm.mx/difconet> for further information.
Routines for model-based functional cluster analysis for functional data with optional covariates. The idea is to cluster functional subjects (often called functional objects) into homogenous groups by using spline smoothers (for functional data) together with scalar covariates. The spline coefficients and the covariates are modelled as a multivariate Gaussian mixture model, where the number of mixtures corresponds to the number of clusters. The parameters of the model are estimated by maximizing the observed mixture likelihood via an EM algorithm (Arnqvist and Sjöstedt de Luna, 2019) <doi:10.48550/arXiv.1904.10265>
. The clustering method is used to analyze annual lake sediment from lake Kassjön (Northern Sweden) which cover more than 6400 years and can be seen as historical records of weather and climate.
It offers a sophisticated and versatile tool for creating and evaluating artificial intelligence based neural network models tailored for regression analysis on datasets with continuous target variables. Leveraging the power of neural networks, it allows users to experiment with various hidden neuron configurations across two layers, optimizing model performance through "5 fold"" or "10 fold"" cross validation. The package normalizes input data to ensure efficient training and assesses model accuracy using key metrics such as R squared (R2), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Percentage Error (PER). By storing and visualizing the best performing models, it provides a comprehensive solution for precise and efficient regression modeling making it an invaluable tool for data scientists and researchers aiming to harness AI for predictive analytics.
Penalized regression methods, such as lasso and elastic net, are used in many biomedical applications when simultaneous regression coefficient estimation and variable selection is desired. However, missing data complicates the implementation of these methods, particularly when missingness is handled using multiple imputation. Applying a variable selection algorithm on each imputed dataset will likely lead to different sets of selected predictors, making it difficult to ascertain a final active set without resorting to ad hoc combination rules. miselect presents Stacked Adaptive Elastic Net (saenet) and Grouped Adaptive LASSO (galasso) for continuous and binary outcomes, developed by Du et al (2022) <doi:10.1080/10618600.2022.2035739>. They, by construction, force selection of the same variables across multiply imputed data. miselect also provides cross validated variants of these methods.
This package provides a method for the multiresolution analysis of spatial fields and images to capture scale-dependent features. mrbsizeR
is based on scale space smoothing and uses differences of smooths at neighbouring scales for finding features on different scales. To infer which of the captured features are credible, Bayesian analysis is used. The scale space multiresolution analysis has three steps: (1) Bayesian signal reconstruction. (2) Using differences of smooths, scale-dependent features of the reconstructed signal can be found. (3) Posterior credibility analysis of the differences of smooths created. The method has first been proposed by Holmstrom, Pasanen, Furrer, Sain (2011) <DOI:10.1016/j.csda.2011.04.011> and extended in Flury, Gerber, Schmid and Furrer (2021) <DOI:10.1016/j.spasta.2020.100483>.
This package offers extensive tools for phylogenetic analysis. It focuses on phylogenetic comparative biology but also includes methods for visualizing, analyzing, manipulating, reading, writing, and inferring phylogenetic trees. Functions for comparative biology include ancestral state reconstruction, model fitting, and phylogeny and trait data simulation. A broad range of plotting methods includes mapping trait evolution on trees, projecting trees into phenotype space or geographic maps, and visualizing correlated speciation between trees. Additional functions allow for reading, writing, analyzing, inferring, simulating, and manipulating phylogenetic trees and comparative data. Examples include computing consensus trees, simulating trees and data under various models, and attaching species or clades to a tree either randomly or non-randomly. This package provides numerous tools for tree manipulations and analyses that are valuable for phylogenetic research.
The Satellite Application Facility on Climate Monitoring (CM SAF) is a ground segment of the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) and one of EUMETSATs Satellite Application Facilities. The CM SAF contributes to the sustainable monitoring of the climate system by providing essential climate variables related to the energy and water cycle of the atmosphere (<https://www.cmsaf.eu>). It is a joint cooperation of eight National Meteorological and Hydrological Services. The cmsafvis R-package provides a collection of R-operators for the analysis and visualization of CM SAF NetCDF
data. CM SAF climate data records are provided for free via (<https://wui.cmsaf.eu/safira>). Detailed information and test data are provided on the CM SAF webpage (<http://www.cmsaf.eu/R_toolbox>).
Read in and analyze functions for education survey and assessment data from the National Center for Education Statistics (NCES) <https://nces.ed.gov/>, including National Assessment of Educational Progress (NAEP) data <https://nces.ed.gov/nationsreportcard/> and data from the International Assessment Database: Organisation for Economic Co-operation and Development (OECD) <https://www.oecd.org/>, including Programme for International Student Assessment (PISA), Teaching and Learning International Survey (TALIS), Programme for the International Assessment of Adult Competencies (PIAAC), and International Association for the Evaluation of Educational Achievement (IEA) <https://www.iea.nl/>, including Trends in International Mathematics and Science Study (TIMSS), TIMSS Advanced, Progress in International Reading Literacy Study (PIRLS), International Civic and Citizenship Study (ICCS), International Computer and Information Literacy Study (ICILS), and Civic Education Study (CivEd
).
This package provides functions for calculating the density, cumulative distribution, quantile, and random number of neo-normal distribution. It also interfaces with the brms package, allowing the use of the neo-normal distribution as a custom family. This integration enables the application of various brms formulas for neo-normal regression. The package implements the following distributions: Modified to be Stable as Normal from Burr (MSNBurr), Modified to be Stable as Normal from Burr-IIa (MSNBurr-IIa), Generalized of MSNBurr (GMSNBurr), and Jones-Faddy Skew-t. References: Choir, A. S. (2020).Unpublished Dissertation. Iriawan, N. (2000).Unpublished Dissertation. Jones, M. C. and Faddy,M. J. (2003).<doi:10.1111/1467-9868.00378>. Rigby, R. A., Stasinopoulos, M. D., Heller, G. Z., & Bastiani, F. D. (2019) <doi:10.1201/9780429298547>.
Analyse light spectra for visual and non-visual (often called melanopic) needs, wrapped up in a Shiny App. Spectran allows for the import of spectra in various CSV forms but also provides a wide range of example spectra and even the creation of own spectral power distributions. The goal of the app is to provide easy access and a visual overview of the spectral calculations underlying common parameters used in the field. It is thus ideal for educational purposes or the creation of presentation ready graphs in lighting research and application. Spectran uses equations and action spectra described in CIE S026 (2018) <doi:10.25039/S026.2018>, DIN/TS 5031-100 (2021) <doi:10.31030/3287213>, and ISO/CIE 23539 (2023) <doi:10.25039/IS0.CIE.23539.2023>.
An approach and software for modelling marine and freshwater ecosystems. It is articulated entirely around trophic levels. EcoTroph's
key displays are bivariate plots, with trophic levels as the abscissa, and biomass flows or related quantities as ordinates. Thus, trophic ecosystem functioning can be modelled as a continuous flow of biomass surging up the food web, from lower to higher trophic levels, due to predation and ontogenic processes. Such an approach, wherein species as such disappear, may be viewed as the ultimate stage in the use of the trophic level metric for ecosystem modelling, providing a simplified but potentially useful caricature of ecosystem functioning and impacts of fishing. This version contains catch trophic spectrum analysis (CTSA) function and corrected versions of the mf.diagnosis and create.ETmain functions.
Two implementations of canonical correlation analysis (CCA) that are based on iterated regression. By choosing the appropriate regression algorithm for each data domain, it is possible to enforce sparsity, non-negativity or other kinds of constraints on the projection vectors. Multiple canonical variables are computed sequentially using a generalized deflation scheme, where the additional correlation not explained by previous variables is maximized. nscancor()
is used to analyze paired data from two domains, and has the same interface as cancor()
from the stats package (plus some extra parameters). mcancor()
is appropriate for analyzing data from three or more domains. See <https://sigg-iten.ch/learningbits/2014/01/20/canonical-correlation-analysis-under-constraints/> and Sigg et al. (2007) <doi:10.1109/MLSP.2007.4414315> for more details.
Practitioners of Bayesian statistics often use Markov chain Monte Carlo (MCMC) samplers to sample from a posterior distribution. This package determines whether the MCMC sample is large enough to yield reliable estimates of the target distribution. In particular, this calculates a Gelman-Rubin convergence diagnostic using stable and consistent estimators of Monte Carlo variance. Additionally, this uses the connection between an MCMC sample's effective sample size and the Gelman-Rubin diagnostic to produce a threshold for terminating MCMC simulation. Finally, this informs the user whether enough samples have been collected and (if necessary) estimates the number of samples needed for a desired level of accuracy. The theory underlying these methods can be found in "Revisiting the Gelman-Rubin Diagnostic" by Vats and Knudson (2018) <arXiv:1812:09384>
.
This package provides a unifying framework for managing and deploying shiny applications that consist of modules, where an "app" is a tab-based workflow that guides a user step-by-step through an analysis. The shinymgr app builder "stitches" shiny modules together so that outputs from one module serve as inputs to the next, creating an analysis pipeline that is easy to implement and maintain. Users of shinymgr apps can save analyses as an RDS file that fully reproduces the analytic steps and can be ingested into an R Markdown report for rapid reporting. In short, developers use the shinymgr framework to write modules and seamlessly combine them into shiny apps, and users of these apps can execute reproducible analyses that can be incorporated into reports for rapid dissemination.
Render R Markdown to Markdown (without using knitr
), and Markdown to lightweight HTML or LaTeX
documents with the commonmark
package (instead of Pandoc). Some missing Markdown features in commonmark
are also supported, such as raw HTML or LaTeX
blocks, LaTeX
math, superscripts, subscripts, footnotes, element attributes, and appendices, but not all Pandoc Markdown features are (or will be) supported. With additional JavaScript
and CSS, you can also create HTML slides and articles. This package can be viewed as a trimmed-down version of R Markdown and knitr
. It does not aim at rich Markdown features or a large variety of output formats (the primary formats are HTML and LaTeX
). Book and website projects of multiple input documents are also supported.
High performance variant of apply()
for a fixed set of functions. Considerable speedup of this implementation is a trade-off for universality: user defined functions cannot be used with this package. However, about 20 most currently employed functions are available for usage. They can be divided in three types: reducing functions (like mean()
, sum()
etc., giving a scalar when applied to a vector), mapping function (like normalise()
, cumsum()
etc., giving a vector of the same length as the input vector) and finally, vector reducing function (like diff()
which produces result vector of a length different from the length of input vector). Optional or mandatory additional arguments required by some functions (e.g. norm type for norm()
) can be passed as named arguments in ...'.
Propagate uncertainty from several estimates when combining these estimates via a function. This is done by using the parametric bootstrap to simulate values from the distribution of each estimate to build up an empirical distribution of the combined parameter. Finally either the percentile method is used or the highest density interval is chosen to derive a confidence interval for the combined parameter with the desired coverage. Gaussian copulas are used for when parameters are assumed to be dependent / correlated. References: Davison and Hinkley (1997,ISBN:0-521-57471-4) for the parametric bootstrap and percentile method, Gelman et al. (2014,ISBN:978-1-4398-4095-5) for the highest density interval, Stockdale et al. (2020)<doi:10.1016/j.jhep.2020.04.008> for an example of combining conditional prevalences.
This package provides functions to upload vectorial data and derive landscape connectivity metrics in habitat or matrix systems. Additionally, includes an approach to assess individual patch contribution to the overall landscape connectivity, enabling the prioritization of habitat patches. The computation of landscape connectivity and patch importance are very useful in Landscape Ecology research. The metrics available are: number of components, number of links, size of the largest component, mean size of components, class coincidence probability, landscape coincidence probability, characteristic path length, expected cluster size, area-weighted flux and integral index of connectivity. Pascual-Hortal, L., and Saura, S. (2006) <doi:10.1007/s10980-006-0013-z> Urban, D., and Keitt, T. (2001) <doi:10.2307/2679983> Laita, A., Kotiaho, J., Monkkonen, M. (2011) <doi:10.1007/s10980-011-9620-4>.