Bayesian network analysis is a form of probabilistic graphical models which derives from empirical data a directed acyclic graph, DAG, describing the dependency structure between random variables. An additive Bayesian network model consists of a form of a DAG where each node comprises a generalized linear model (GLM). Additive Bayesian network models are equivalent to Bayesian multivariate regression using graphical modelling, they generalises the usual multivariable regression, GLM, to multiple dependent variables. This package provides routines to help determine optimal Bayesian network models for a given data set, where these models are used to identify statistical dependencies in messy, complex data.
This package provides a collection of psychometric methods to process item metadata and use target assessment and measurement blueprint constraints to assemble a test form. Currently two automatic test assembly (ata) approaches are enabled. For example, the weighted (positive) deviations method, wdm()
, proposed by Swanson and Stocking (1993) <doi:10.1177/014662169301700205> was implemented in its full specification allowing for both item selection as well as test form refinement. The linear constraint programming approach, atalp()
, uses the linear equation solver by Berkelaar et. al (2014) <http://lpsolve.sourceforge.net/5.5/> to enable a variety of approaches to select items.
Computations for approximations and alternatives for the DPQ (Density (pdf), Probability (cdf) and Quantile) functions for probability distributions in R. Primary focus is on (central and non-central) beta, gamma and related distributions such as the chi-squared, F, and t. -- For several distribution functions, provide functions implementing formulas from Johnson, Kotz, and Kemp (1992) <doi:10.1002/bimj.4710360207> and Johnson, Kotz, and Balakrishnan (1995) for discrete or continuous distributions respectively. This is for the use of researchers in these numerical approximation implementations, notably for my own use in order to improve standard R pbeta()
, qgamma()
, ..., etc: '"dpq"'-functions.
Opinionated functions that enable easier and faster analysis of Viva Insights data. There are three main types of functions in wpa': (i) Standard functions create a ggplot visual or a summary table based on a specific Viva Insights metric; (2) Report Generation functions generate HTML reports on a specific analysis area, e.g. Collaboration; (3) Other miscellaneous functions cover more specific applications (e.g. Subject Line text mining) of Viva Insights data. This package adheres to tidyverse principles and works well with the pipe syntax. wpa is built with the beginner-to-intermediate R users in mind, and is optimised for simplicity.
This package provides a distance density clustering (DDC) algorithm in R. DDC uses dynamic time warping (DTW) to compute a similarity matrix, based on which cluster centers and cluster assignments are found. DDC inherits dynamic time warping (DTW) arguments and constraints. The cluster centers are centroid points that are calculated using the DTW Barycenter Averaging (DBA) algorithm. The clustering process is divisive. At each iteration, cluster centers are updated and data is reassigned to cluster centers. Early stopping is possible. The output includes cluster centers and clustering assignment, as described in the paper (Ma et al (2017) <doi:10.1109/ICDMW.2017.11>).
Various tools for the analysis of univariate, multivariate and functional extremes. Exact simulation from max-stable processes [Dombry, Engelke and Oesting (2016) <doi:10.1093/biomet/asw008>, R-Pareto processes for various parametric models, including Brown-Resnick (Wadsworth and Tawn, 2014, <doi:10.1093/biomet/ast042>) and Extremal Student (Thibaud and Opitz, 2015, <doi:10.1093/biomet/asv045>). Threshold selection methods, including Wadsworth (2016) <doi:10.1080/00401706.2014.998345>, and Northrop and Coleman (2014) <doi:10.1007/s10687-014-0183-z>. Multivariate extreme diagnostics. Estimation and likelihoods for univariate extremes, e.g., Coles (2001) <doi:10.1007/978-1-4471-3675-0>.
This package implements maximum likelihood and bootstrap methods based on the diversity-dependent birth-death process to test whether speciation or extinction are diversity-dependent, under various models including various types of key innovations. See Etienne et al. 2012, Proc. Roy. Soc. B 279: 1300-1309, <DOI:10.1098/rspb.2011.1439>, Etienne & Haegeman 2012, Am. Nat. 180: E75-E89, <DOI:10.1086/667574>, Etienne et al. 2016. Meth. Ecol. Evol. 7: 1092-1099, <DOI:10.1111/2041-210X.12565> and Laudanno et al. 2021. Syst. Biol. 70: 389â 407, <DOI:10.1093/sysbio/syaa048>. Also contains functions to simulate the diversity-dependent process.
This package provides a collection of process capability index functions, such as C_p()
, C_pk()
, C_pm()
, and others, along with metadata about each, like LaTeX
equations and R expressions. Its primary purpose is to form a foundation for other quality control packages to build on top of, by providing basic resources and functions. The indices belong to the field of statistical quality control, and quantify the degree to which a manufacturing process is able to create items that adhere to a certain standard of quality. For details see Montgomery, D. C. (2019, ISBN:978-1-119-39930-8).
In panel data settings, specifies set of candidate models, fits them to data from pre-treatment validation periods, and selects model as average over candidate models, weighting each by posterior probability of being most robust given its differential average prediction errors in pre-treatment validation periods. Subsequent estimation and inference of causal effect's bounds accounts for both model and sampling uncertainty, and calculates the robustness changepoint value at which bounds go from excluding to including 0. The package also includes a range of diagnostic plots, such as those illustrating models differential average prediction errors and the posterior distribution of which model is most robust.
Sequential and batch change detection for univariate data streams, using the change point model framework. Functions are provided to allow nonparametric distribution-free change detection in the mean, variance, or general distribution of a given sequence of observations. Parametric change detection methods are also provided for Gaussian, Bernoulli and Exponential sequences. Both the batch (Phase I) and sequential (Phase II) settings are supported, and the sequences may contain either a single or multiple change points. A full description of this package is available in Ross, G.J (2015) - "Parametric and nonparametric sequential change detection in R" available at <https://www.jstatsoft.org/article/view/v066i03>.
Maximum likelihood estimation for the semi-parametric joint modeling of competing risks and longitudinal data in the presence of heterogeneous within-subject variability, proposed by Li and colleagues (2023) <arXiv:2301.06584>
. The proposed method models the within-subject variability of the biomarker and associates it with the risk of the competing risks event. The time-to-event data is modeled using a (cause-specific) Cox proportional hazards regression model with time-fixed covariates. The longitudinal outcome is modeled using a mixed-effects location and scale model. The association is captured by shared random effects. The model is estimated using an Expectation Maximization algorithm.
Time series decomposition for univariate time series using the "Verallgemeinerte Berliner Verfahren" (Generalized Berlin Method) as described in Kontinuierliche Messgröà en und Stichprobenstrategien in Raum und Zeit mit Anwendungen in den Natur-, Umwelt-, Wirtschafts- und Finanzwissenschaften', by Hebbel and Steuer, Springer Berlin Heidelberg, 2022 <doi:10.1007/978-3-662-65638-9>, or Decomposition of Time Series using the Generalised Berlin Method (VBV) by Hebbel and Steuer, in Jan Beran, Yuanhua Feng, Hartmut Hebbel (Eds.): Empirical Economic and Financial Research - Theory, Methods and Practice, Festschrift in Honour of Prof. Siegfried Heiler. Series: Advanced Studies in Theoretical and Applied Econometrics. Springer 2014, p. 9-40.
Computes the Weighted Topological Overlap with positive and negative signs (wTO
) networks given a data frame containing the mRNA
count/ expression/ abundance per sample, and a vector containing the interested nodes of interaction (a subset of the elements of the full data frame). It also computes the cut-off threshold or p-value based on the individuals bootstrap or the values reshuffle per individual. It also allows the construction of a consensus network, based on multiple wTO
networks. The package includes a visualization tool for the networks. More about the methodology can be found at <doi:10.1186/s12859-018-2351-7>.
Compare two classifications or clustering solutions that may or may not have the same number of classes, and that might have hard or soft (fuzzy, probabilistic) membership. Calculate various metrics to assess how the clusters compare to each other. The calculations are simple, but provide a handy tool for users unfamiliar with matrix multiplication. This package is not geared towards traditional accuracy assessment for classification/ mapping applications - the motivating use case is for comparing a probabilistic clustering solution to a set of reference or existing class labels that could have any number of classes (that is, without having to degrade the probabilistic clustering to hard classes).
Spatio-temporal Fixation Pattern Analysis (FPA) is a new method of analyzing eye movement data, developed by Mr. Jinlu Cao under the supervision of Prof. Chen Hsuan-Chih at The Chinese University of Hong Kong, and Prof. Wang Suiping at the South China Normal Univeristy. The package "fpa" is a R implementation which makes FPA analysis much easier. There are four major functions in the package: ft2fp()
, get_pattern()
, plot_pattern()
, and lineplot()
. The function ft2fp()
is the core function, which can complete all the preprocessing within moments. The other three functions are supportive functions which visualize the eye fixation patterns.
The Hypothesis tests for the means of independent or paired groups. This package investigates the normality assumption automatically. Then, it tests the hypothesis tests for two independent or paired group means by using parametric or non-parametric tests. It uses the Shapiro-Wilk test to test the normality assumption. For independent two groups, If data comes from the normal distribution, the package uses the Z or t-test according to whether variances are known. For paired groups, it uses paired t-test under normal data sets. If data does not come from the normal distribution, the package uses the Wilcoxon test for independent and paired cases.
Integration of Earth system data from various sources is a challenging task. Except for their qualitative heterogeneity, different data records exist for describing similar Earth system process at different spatio-temporal scales. Data inter-comparison and validation are usually performed at a single spatial or temporal scale, which could hamper the identification of potential discrepancies in other scales. csa package offers a simple, yet efficient, graphical method for synthesizing and comparing observed and modelled data across a range of spatio-temporal scales. Instead of focusing at specific scales, such as annual means or original grid resolution, we examine how their statistical properties change across spatio-temporal continuum.
Finite element modeling of beam structures and 2D geometries using constant strain triangles. Applies material properties and boundary conditions (load and constraint) to generate a finite element model. The model produces stress, strain, and nodal displacements; a heat map is available to demonstrate regions where output variables are high or low. Also provides options for creating a triangular mesh of 2D geometries. Package developed with reference to: Bathe, K. J. (1996). Finite Element Procedures.[ISBN 978-0-9790049-5-7] -- Seshu, P. (2012). Textbook of Finite Element Analysis. [ISBN-978-81-203-2315-5] -- Mustapha, K. B. (2018). Finite Element Computations in Mechanics with R. [ISBN 9781315144474].
When the values of the outcome variable Y are either 0 or 1, the function lsm()
calculates the estimation of the log likelihood in the saturated model. This model is characterized by Llinas (2006, ISSN:2389-8976) in section 2.3 through the assumptions 1 and 2. The function LogLik()
works (almost perfectly) when the number of independent variables K is high, but for small K it calculates wrong values in some cases. For this reason, when Y is dichotomous and the data are grouped in J populations, it is recommended to use the function lsm()
because it works very well for all K.
This package provides a few major genes and a series of polygene are responsive for each quantitative trait. Major genes are individually identified while polygene is collectively detected. This is mixed major genes plus polygene inheritance analysis or segregation analysis (SEA). In the SEA, phenotypes from a single or multiple bi-parental segregation populations along with their parents are used to fit all the possible models and the best model of the trait for population phenotypic distributions is viewed as the model of the trait. There are fourteen types of population combinations available. Zhang Yuan-Ming, Gai Jun-Yi, Yang Yong-Hua (2003, <doi:10.1017/S0016672303006141>).
This package provides a set of tests for compositional pathologies. Tests for coherence of correlations with aIc.coherent()
as suggested by (Erb et al. (2020) <doi:10.1016/j.acags.2020.100026>), compositional dominance of distance with aIc.dominant()
, compositional perturbation invariance with aIc.perturb()
as suggested by (Aitchison (1992) <doi:10.1007/BF00891269>) and singularity of the covariation matrix with aIc.singular()
. Currently tests five data transformations: prop, clr, TMM, TMMwsp, and RLE from the R packages ALDEx2', edgeR
and DESeq2 (Fernandes et al (2014) <doi:10.1186/2049-2618-2-15>, Anders et al. (2013)<doi:10.1038/nprot.2013.099>).
Reproducibility assessment is essential in extracting reliable scientific insights from high-throughput experiments. While the Irreproducibility Discovery Rate (IDR) method has been instrumental in assessing reproducibility, its standard implementation is constrained to handling only two replicates. Package eCV
introduces an enhanced Coefficient of Variation (eCV
) metric to assess the likelihood of omic features being reproducible. Additionally, it offers alternatives to the Irreproducible Discovery Rate (IDR) calculations for multi-replicate experiments. These tools are valuable for analyzing high-throughput data in genomics and other omics fields. The methods implemented in eCV
are described in Gonzalez-Reymundez et al., (2023) <doi:10.1101/2023.12.18.572208>.
This package provides a consistent representation of year-based time scales as a numeric vector with an associated era'. There are built-in era definitions for many year numbering systems used in contemporary and historic calendars (e.g. Common Era, Islamic Hijri years); year-based time scales used in archaeology, astronomy, geology, and other palaeosciences (e.g. Before Present, SI-prefixed annus'); and support for arbitrary user-defined eras. Years can converted from any one era to another using a generalised transformation function. Methods are also provided for robust casting and coercion between years and other numeric types, type-stable arithmetic with years, and pretty-printing in tables.
In medical research, supervised heterogeneity analysis has important implications. Assume that there are two types of features. Using both types of features, our goal is to conduct the first supervised heterogeneity analysis that satisfies a hierarchical structure. That is, the first type of features defines a rough structure, and the second type defines a nested and more refined structure. A penalization approach is developed, which has been motivated by but differs significantly from penalized fusion and sparse group penalization. Reference: Ren, M., Zhang, Q., Zhang, S., Zhong, T., Huang, J. & Ma, S. (2022). "Hierarchical cancer heterogeneity analysis based on histopathological imaging features". Biometrics, <doi:10.1111/biom.13426>.