Reproducibility assessment is essential in extracting reliable scientific insights from high-throughput experiments. While the Irreproducibility Discovery Rate (IDR) method has been instrumental in assessing reproducibility, its standard implementation is constrained to handling only two replicates. Package eCV introduces an enhanced Coefficient of Variation (eCV) metric to assess the likelihood of omic features being reproducible. Additionally, it offers alternatives to the Irreproducible Discovery Rate (IDR) calculations for multi-replicate experiments. These tools are valuable for analyzing high-throughput data in genomics and other omics fields. The methods implemented in eCV are described in Gonzalez-Reymundez et al., (2023) <doi:10.1101/2023.12.18.572208>.
This package provides a consistent representation of year-based time scales as a numeric vector with an associated era'. There are built-in era definitions for many year numbering systems used in contemporary and historic calendars (e.g. Common Era, Islamic Hijri years); year-based time scales used in archaeology, astronomy, geology, and other palaeosciences (e.g. Before Present, SI-prefixed annus'); and support for arbitrary user-defined eras. Years can converted from any one era to another using a generalised transformation function. Methods are also provided for robust casting and coercion between years and other numeric types, type-stable arithmetic with years, and pretty-printing in tables.
In medical research, supervised heterogeneity analysis has important implications. Assume that there are two types of features. Using both types of features, our goal is to conduct the first supervised heterogeneity analysis that satisfies a hierarchical structure. That is, the first type of features defines a rough structure, and the second type defines a nested and more refined structure. A penalization approach is developed, which has been motivated by but differs significantly from penalized fusion and sparse group penalization. Reference: Ren, M., Zhang, Q., Zhang, S., Zhong, T., Huang, J. & Ma, S. (2022). "Hierarchical cancer heterogeneity analysis based on histopathological imaging features". Biometrics, <doi:10.1111/biom.13426>.
In a typical experiment for the intuitive judgment of frequencies (JoF) different stimuli with different frequencies are presented. The participants consider these stimuli with a constant duration and give a judgment of frequency. These judgments can be simulated by formal models: PASS 1 and PASS 2 based on Sedlmeier (2002, ISBN:978-0198508632), MINERVA 2 baesd on Hintzman (1984) <doi:10.3758/BF03202365> and TODAM 2 based on Murdock, Smith & Bai (2001) <doi:10.1006/jmps.2000.1339>. The package provides an assessment of the frequency by determining the core aspects of these four models (attention, decay, and presented frequency) that can be compared to empirical results.
This package provides a financial calculator that provides very fast implementations of common financial indicators using Rust code. It includes functions for bond-related indicators, such as yield to maturity ('YTM'), modified duration, and Macaulay duration, as well as functions for calculating time-weighted and money-weighted rates of return (using Modified Dietz method) for multiple portfolios, given their market values and profit and loss ('PnL') data. fcl is designed to be efficient and accurate for financial analysis and computation. The methods used in this package are based on the following references: <https://en.wikipedia.org/wiki/Modified_Dietz_method>, <https://en.wikipedia.org/wiki/Time-weighted_return>.
Oak declines are complex disease syndromes and consist of many visual indicators that include aspects of tree size, crown condition and trunk condition. This can cause difficulty in the manual classification of symptomatic and non-symptomatic trees from what is in reality a broad spectrum of oak tree health condition. Two phenotypic oak decline indexes have been developed to quantitatively describe and differentiate oak decline syndromes in Quercus robur. This package provides a toolkit to generate these decline indexes from phenotypic descriptors using the machine learning algorithm random forest. The methodology for generating these indexes is outlined in Finch et al. (2121) <doi:10.1016/j.foreco.2021.118948>.
This package provides tools to easily access and analyze Canadian Election Study data. The package simplifies the process of downloading, cleaning, and using CES datasets for political science research and analysis. The Canadian Election Study ('CES') has been conducted during federal elections since 1965, surveying Canadians on their political preferences, engagement, and demographics. Data is accessed from multiple sources including the Borealis Data repository <https://borealisdata.ca/> and the official Canadian Election Study website <https://ces-eec.arts.ubc.ca/>. This package is not officially affiliated with the Canadian Election Study, Borealis Data, or the University of British Columbia, and users should cite the original data sources in their work.
The Directed Prediction Index ('DPI') is a causal discovery method for observational data designed to quantify the relative endogeneity of outcome (Y) versus predictor (X) variables in regression models. By comparing the coefficients of determination (R-squared) between the Y-as-outcome and X-as-outcome models while controlling for sufficient confounders and simulating k random covariates, it can quantify relative endogeneity, providing a necessary but insufficient condition for causal direction from a less endogenous variable (X) to a more endogenous variable (Y). Methodological details are provided at <https://psychbruce.github.io/DPI/>. This package also includes functions for data simulation and network analysis (correlation, partial correlation, and Bayesian Networks).
This package provides a collection of several utility functions related to binary incomplete block designs. Contains function to generate A- and D-efficient binary incomplete block designs with given numbers of treatments, number of blocks and block size. Contains function to generate an incomplete block design with specified concurrence matrix. There are functions to generate balanced treatment incomplete block designs and incomplete block designs for test versus control treatments comparisons with specified concurrence matrix. Allows performing analysis of variance of data and computing estimated marginal means of factors from experiments using a connected incomplete block design. Tests of hypothesis of treatment contrasts in incomplete block design set up is supported.
Several functions for maximum likelihood estimation of various univariate and multivariate distributions. The list includes more than 100 functions for univariate continuous and discrete distributions, distributions that lie on the real line, the positive line, interval restricted, circular distributions. Further, multivariate continuous and discrete distributions, distributions for compositional and directional data, etc. Some references include Johnson N. L., Kotz S. and Balakrishnan N. (1994). "Continuous Univariate Distributions, Volume 1" <ISBN:978-0-471-58495-7>, Johnson, Norman L. Kemp, Adrianne W. Kotz, Samuel (2005). "Univariate Discrete Distributions". <ISBN:978-0-471-71580-1> and Mardia, K. V. and Jupp, P. E. (2000). "Directional Statistics". <ISBN:978-0-471-95333-3>.
Implementation of the Open Perimetry Interface (OPI) for simulating and controlling visual field machines using R. The OPI is a standard for interfacing with visual field testing machines (perimeters) first started as an open source project with support of Haag-Streit in 2010. It specifies basic functions that allow many visual field tests to be constructed. As of February 2022 it is fully implemented on the Haag-Streit Octopus 900 and CrewT ImoVifa ('Topcon Tempo') with partial implementations on the Centervue Compass, Kowa AP 7000 and Android phones. It also has a cousin: the R package visualFields', which has tools for analysing and manipulating visual field data.
Adversarial random forests (ARFs) recursively partition data into fully factorized leaves, where features are jointly independent. The procedure is iterative, with alternating rounds of generation and discrimination. Data becomes increasingly realistic at each round, until original and synthetic samples can no longer be reliably distinguished. This is useful for several unsupervised learning tasks, such as density estimation and data synthesis. Methods for both are implemented in this package. ARFs naturally handle unstructured data with mixed continuous and categorical covariates. They inherit many of the benefits of random forests, including speed, flexibility, and solid performance with default parameters. For details, see Watson et al. (2023) <https://proceedings.mlr.press/v206/watson23a.html>.
The outcome of XCMS data processing strongly depends on the parameter settings. IPO (`Isotopologue Parameter Optimization`) is a parameter optimization tool that is applicable for different kinds of samples and liquid chromatography coupled to high resolution mass spectrometry devices, fast and free of labeling steps. IPO uses natural, stable 13C isotopes to calculate a peak picking score. Retention time correction is optimized by minimizing the relative retention time differences within features and grouping parameters are optimized by maximizing the number of features showing exactly one peak from each injection of a pooled sample. The different parameter settings are achieved by design of experiment. The resulting scores are evaluated using response surface models.
NeuroAnatomy Toolbox (nat) enables analysis and visualisation of 3D biological image data, especially traced neurons. Reads and writes 3D images in NRRD and Amira AmiraMesh formats and reads surfaces in Amira hxsurf format. Traced neurons can be imported from and written to SWC and Amira LineSet and SkeletonGraph formats. These data can then be visualised in 3D via rgl', manipulated including applying calculated registrations, e.g. using the CMTK registration suite, and analysed. There is also a simple representation for neurons that have been subjected to 3D skeletonisation but not formally traced; this allows morphological comparison between neurons including searches and clustering (via the nat.nblast extension package).
This package provides an R interface for SSW (Striped Smith-Waterman) via its Python binding ssw-py'. SSW is a fast C and C++ implementation of the Smith-Waterman algorithm for pairwise sequence alignment using Single-Instruction-Multiple-Data (SIMD) instructions. SSW enhances the standard algorithm by efficiently returning alignment information and suboptimal alignment scores. The core SSW library offers performance improvements for various bioinformatics tasks, including protein database searches, short-read alignments, primary and split-read mapping, structural variant detection, and read-overlap graph generation. These features make SSW particularly useful for genomic applications. Zhao et al. (2013) <doi:10.1371/journal.pone.0082138> developed the original C and C++ implementation.
This is a new version of the userfriendlyscience package, which has grown a bit unwieldy. Therefore, distinct functionalities are being consciously uncoupled into different packages. This package contains the general-purpose tools and utilities (see the behaviorchange package, the rosetta package, and the soon-to-be-released scd package for other functionality), and is the most direct successor of the original userfriendlyscience package. For example, this package contains a number of basic functions to create higher level plots, such as diamond plots, to easily plot sampling distributions, to generate confidence intervals, to plan study sample sizes for confidence intervals, and to do some basic operations such as (dis)attenuate effect size estimates.
Density, distribution, quantile function, random number generation for the BMT (Bezier-Montenegro-Torres) distribution. Torres-Jimenez C.J. and Montenegro-Diaz A.M. (2017) <doi:10.48550/arXiv.1709.05534>. Moments, descriptive measures and parameter conversion for different parameterizations of the BMT distribution. Fit of the BMT distribution to non-censored data by maximum likelihood, moment matching, quantile matching, maximum goodness-of-fit, also known as minimum distance, maximum product of spacing, also called maximum spacing, and minimum quantile distance, which can also be called maximum quantile goodness-of-fit. Fit of univariate distributions for non-censored data using maximum product of spacing estimation and minimum quantile distance estimation is also included.
This package provides tools for working with a new versatile discrete distribution, the db ("discretised Beta") distribution. This package provides density (probability), distribution, inverse distribution (quantile) and random data generation functions for the db family. It provides functions to effect conveniently maximum likelihood estimation of parameters, and a variety of useful plotting functions. It provides goodness of fit tests and functions to calculate the Fisher information, different estimates of the hessian of the log likelihood and Monte Carlo estimation of the covariance matrix of the maximum likelihood parameter estimates. In addition it provides analogous tools for working with the beta-binomial distribution which has been proposed as a competitor to the db distribution.
This package provides a set of tools to perform multiple versions of the Mobility Oriented-Parity metric. This multivariate analysis helps to characterize levels of dissimilarity between a set of conditions of reference and another set of conditions of interest. If predictive models are transferred to conditions different from those over which models were calibrated (trained), this metric helps to identify transfer conditions that differ substantially from those of calibration. These tools are implemented following principles proposed in Owens et al. (2013) <doi:10.1016/j.ecolmodel.2013.04.011>, and expanded to obtain more detailed results that aid in interpretation as in Cobos et al. (2024) <doi:10.21425/fob.17.132916>.
The main goal is to make descriptive evaluations easier to create bigger and more complex outputs in less time with less code. Introducing format containers with multilabels <https://documentation.sas.com/doc/en/pgmsascdc/v_067/proc/p06ciqes4eaqo6n0zyqtz9p21nfb.htm>, a more powerful summarise which is capable to output every possible combination of the provided grouping variables in one go <https://documentation.sas.com/doc/en/pgmsascdc/v_067/proc/p0jvbbqkt0gs2cn1lo4zndbqs1pe.htm>, tabulation functions which can create any table in different styles <https://documentation.sas.com/doc/en/pgmsascdc/v_067/proc/n1ql5xnu0k3kdtn11gwa5hc7u435.htm> and other more readable functions. The code is optimized to work fast even with datasets of over a million observations.
Using frequency matrices, very low frequency variants (VLFs) are assessed for amino acid and nucleotide sequences. The VLFs are then compared to see if they occur in only one member of a species, singleton VLFs, or if they occur in multiple members of a species, shared VLFs. The amino acid and nucleotide VLFs are then compared to see if they are concordant with one another. Amino acid VLFs are also assessed to determine if they lead to a change in amino acid residue type, and potential changes to protein structures. Based on Stoeckle and Kerr (2012) <doi:10.1371/journal.pone.0043992> and Phillips et al. (2023) <doi:10.3897/BDJ.11.e96480>.
This package provides a general framework for constructing variable importance plots from various types of machine learning models in R. Aside from some standard model- specific variable importance measures, this package also provides model- agnostic approaches that can be applied to any supervised learning algorithm. These include 1) an efficient permutation-based variable importance measure, 2) variable importance based on Shapley values (Strumbelj and Kononenko, 2014) <doi:10.1007/s10115-013-0679-x>, and 3) the variance-based approach described in Greenwell et al. (2018) <doi:10.48550/arXiv.1805.04755>. A variance-based method for quantifying the relative strength of interaction effects is also included (see the previous reference for details).
This package provides a collection of functions for analyzing data typically collected or used by behavioral scientists. Examples of the functions include a function that compares groups in a factorial experimental design, a function that conducts two-way analysis of variance (ANOVA), and a function that cleans a data set generated by Qualtrics surveys. Some of the functions will require installing additional package(s). Such packages and other references are cited within the section describing the relevant functions. Many functions in this package rely heavily on these two popular R packages: Dowle et al. (2021) <https://CRAN.R-project.org/package=data.table>. Wickham et al. (2021) <https://CRAN.R-project.org/package=ggplot2>.