This package creates and fits staged event tree probability models, which are probabilistic graphical models capable of representing asymmetric conditional independence statements for categorical variables. Includes functions to create, plot and fit staged event trees from data, as well as many efficient structure learning algorithms. References: Carli F, Leonelli M, Riccomagno E, Varando G (2022). <doi: 10.18637/jss.v102.i06>. Collazo R. A., Görgen C. and Smith J. Q. (2018, ISBN:9781498729604). Görgen C., Bigatti A., Riccomagno E. and Smith J. Q. (2018) <arXiv:1705.09457>
. Thwaites P. A., Smith, J. Q. (2017) <arXiv:1510.00186>
. Barclay L. M., Hutton J. L. and Smith J. Q. (2013) <doi:10.1016/j.ijar.2013.05.006>. Smith J. Q. and Anderson P. E. (2008) <doi:10.1016/j.artint.2007.05.004>.
This package contains tools for survey statistics (especially in educational assessment) for datasets with replication designs (jackknife, bootstrap, replicate weights; see Kolenikov, 2010; Pfefferman & Rao, 2009a, 2009b, <doi:10.1016/S0169-7161(09)70003-3>, <doi:10.1016/S0169-7161(09)70037-9>); Shao, 1996, <doi:10.1080/02331889708802523>). Descriptive statistics, linear and logistic regression, path models for manifest variables with measurement error correction and two-level hierarchical regressions for weighted samples are included. Statistical inference can be conducted for multiply imputed datasets and nested multiply imputed datasets and is in particularly suited for the analysis of plausible values (for details see George, Oberwimmer & Itzlinger-Bruneforth, 2016; Bruneforth, Oberwimmer & Robitzsch, 2016; Robitzsch, Pham & Yanagida, 2016). The package development was supported by BIFIE (Federal Institute for Educational Research, Innovation and Development of the Austrian School System; Salzburg, Austria).
This package provides a set of procedures for parametric and non-parametric modelling of the dependence structure of multivariate extreme-values is provided. The statistical inference is performed with non-parametric estimators, likelihood-based estimators and Bayesian techniques. It adapts the methodologies of Beranger and Padoan (2015) <doi:10.48550/arXiv.1508.05561>
, Marcon et al. (2016) <doi:10.1214/16-EJS1162>, Marcon et al. (2017) <doi:10.1002/sta4.145>, Marcon et al. (2017) <doi:10.1016/j.jspi.2016.10.004> and Beranger et al. (2021) <doi:10.1007/s10687-019-00364-0>. This package also allows for the modelling of spatial extremes using flexible max-stable processes. It provides simulation algorithms and fitting procedures relying on the Stephenson-Tawn likelihood as per Beranger at al. (2021) <doi:10.1007/s10687-020-00376-1>.
Finite element modeling (FEM) uses meshes of triangles to define surfaces. A surface within a triangle may be either linear or quadratic. In the order one case each node in the mesh is associated with a basis function and the basis is called the order one finite element basis. In the order two case each edge mid-point is also associated with a basis function. Functions are provided for smoothing, density function estimation point evaluation and plotting results. Two papers illustrating the finite element data analysis are Sangalli, L.M., Ramsay, J.O., Ramsay, T.O. (2013)<http://www.mox.polimi.it/~sangalli> and Bernardi, M.S, Carey, M., Ramsay, J. O., Sangalli, L. (2018)<http://www.mox.polimi.it/~sangalli>. Modelling spatial anisotropy via regression with partial differential regularization Journal of Multivariate Analysis, 167, 15-30.
This package provides a comprehensive framework for building, evaluating, and visualizing regression models for analyzing viral load and CD4 (Cluster of Differentiation 4) lymphocytes data. It leverages the principles of the tidymodels ecosystem of Max Kuhn and Hadley Wickham (2020) <https://www.tidymodels.org> to offer a user-friendly experience in model development. This package includes functions for data preprocessing, feature engineering, model training, tuning, and evaluation, along with visualization tools to enhance the interpretation of model results. It is specifically designed for researchers in biostatistics, computational biology, and HIV research who aim to perform reproducible and rigorous analyses to gain insights into disease dynamics. The main focus is on improving the understanding of the relationships between viral load, CD4 lymphocytes, and other relevant covariates to contribute to HIV research and the visibility of vulnerable seropositive populations.
This package implements an approach aimed at assessing the accuracy and effectiveness of raw scores obtained in scales that contain locally dependent items. The program uses as input the calibration (structural) item estimates obtained from fitting extended unidimensional factor-analytic solutions in which the existing local dependencies are included. Measures of reliability (Omega) and information are proposed at three levels: (a) total score, (b) bivariate-doublet, and (c) item-by-item deletion, and are compared to those that would be obtained if all the items had been locally independent. All the implemented procedures can be obtained from: (a) linear factor-analytic solutions in which the item scores are treated as approximately continuous, and (b) non-linear solutions in which the item scores are treated as ordered-categorical. A detailed guide can be obtained at the following url.
Integrated species distribution modeling is a rising field in quantitative ecology thanks to significant rises in the quantity of data available, increases in computational speed and the proven benefits of using such models. Despite this, the general software to help ecologists construct such models in an easy-to-use framework is lacking. We therefore introduce the R package PointedSDMs
': which provides the tools to help ecologists set up integrated models and perform inference on them. There are also functions within the package to help run spatial cross-validation for model selection, as well as generic plotting and predicting functions. An introduction to these methods is discussed in Issac, Jarzyna, Keil, Dambly, Boersch-Supan, Browning, Freeman, Golding, Guillera-Arroita, Henrys, Jarvis, Lahoz-Monfort, Pagel, Pescott, Schmucki, Simmonds and Oâ Hara (2020) <doi:10.1016/j.tree.2019.08.006>.
Computes segregation indices, including the Index of Dissimilarity, as well as the information-theoretic indices developed by Theil (1971) <isbn:978-0471858454>, namely the Mutual Information Index (M) and Theil's Information Index (H). The M, further described by Mora and Ruiz-Castillo (2011) <doi:10.1111/j.1467-9531.2011.01237.x> and Frankel and Volij (2011) <doi:10.1016/j.jet.2010.10.008>, is a measure of segregation that is highly decomposable. The package provides tools to decompose the index by units and groups (local segregation), and by within and between terms. The package also provides a method to decompose differences in segregation as described by Elbers (2021) <doi:10.1177/0049124121986204>. The package includes standard error estimation by bootstrapping, which also corrects for small sample bias. The package also contains functions for visualizing segregation patterns.
Anscombe's quartet are a set of four two-variable datasets that have several common summary statistics but which have very different joint distributions. This becomes apparent when the data are plotted, which illustrates the importance of using graphical displays in Statistics. This package enables the creation of datasets that have identical marginal sample means and sample variances, sample correlation, least squares regression coefficients and coefficient of determination. The user supplies an initial dataset, which is shifted, scaled and rotated in order to achieve target summary statistics. The general shape of the initial dataset is retained. The target statistics can be supplied directly or calculated based on a user-supplied dataset. The datasauRus
package <https://cran.r-project.org/package=datasauRus>
provides further examples of datasets that have markedly different scatter plots but share many sample summary statistics.
Compute Buishand Range Test, Pettit Test, SNHT, Student t-test, and Mann-Whitney Rank Test, to identify breakpoints in series. For all functions NA is allowed. Since all of the mention methods identify only one breakpoint in a series, a general function to look for N breakpoint is given. Also, the Yamamoto test for climate jump is available. Alexandersson, H. (1986) <doi:10.1002/joc.3370060607>, Buishand, T. (1982) <doi:10.1016/0022-1694(82)90066-X>, Hurtado, S. I., Zaninelli, P. G., & Agosta, E. A. (2020) <doi:10.1016/j.atmosres.2020.104955>, Mann, H. B., Whitney, D. R. (1947) <doi:10.1214/aoms/1177730491>, Pettitt, A. N. (1979) <doi:10.2307/2346729>, Ruxton, G. D., jul (2006) <doi:10.1093/beheco/ark016>, Yamamoto, R., Iwashima, T., Kazadi, S. N., & Hoshiai, M. (1985) <doi:10.2151/jmsj1965.63.6_1157>.
This package contains all of the functions necessary for the complete analysis of a continuous glucose monitoring study and can be applied to data measured by various existing CGM devices such as FreeStyle
Libre', Glutalor', Dexcom and Medtronic CGM'. It reads a series of data files, is able to convert various formats of time stamps, can deal with missing values, calculates both regular statistics and nonlinear statistics, and conducts group comparison. It also displays results in a concise format. Also contains two unique features new to CGM analysis: one is the implementation of strictly standard mean difference and the class of effect size; the other is the development of a new type of plot called antenna plot. It corresponds to Zhang XD'(2018)<doi:10.1093/bioinformatics/btx826>'s article CGManalyzer: an R package for analyzing continuous glucose monitoring studies'.
Detection of differentially expressed genes (DEGs) from the comparison of two biological conditions (treated vs. untreated, diseased vs. normal, mutant vs. wild-type) among different levels of gene expression (transcriptome ,translatome, proteome), using several statistical methods: Rank Product, Translational Efficiency, t-test, Limma, ANOTA, DESeq, edgeR
. Possibility to plot the results with scatterplots, histograms, MA plots, standard deviation (SD) plots, coefficient of variation (CV) plots. Detection of significantly enriched post-transcriptional regulatory factors (RBPs, miRNAs
, etc) and Gene Ontology terms in the lists of DEGs previously identified for the two expression levels. Comparison of GO terms enriched only in one of the levels or in both. Calculation of the semantic similarity score between the lists of enriched GO terms coming from the two expression levels. Visual examination and comparison of the enriched terms with heatmaps, radar plots and barplots.
Calculate Overall Survival or Recurrence-Free Survival for breast cancer patients, using NHS Predict'. The time interval for the estimation can be set up to 15 years, with default at 10. Incremental therapy benefits are estimated for hormone therapy, chemotherapy, trastuzumab, and bisphosphonates. An additional function, suited for SCAN audits, features a more user-friendly version of the code, with fewer inputs, but necessitates the correct standardised inputs. This work is not affiliated with the development of NHS Predict and its underlying statistical model. Details on NHS Predict can be found at: <doi:10.1186/bcr2464>. The web version of NHS Predict': <https://breast.predict.nhs.uk/>. A small dataset of 50 fictional patient observations is provided for the purpose of running examples with the main two functions, and an additional dataset is provided for running example with the dedicated SCAN function.
This package provides estimates for the bivariate and trivariate distribution functions and bivariate and trivariate survival functions for censored gap times. Two approaches, using existing methodologies, are considered: (i) the Lin's estimator, which is based on the extension the Kaplan-Meier estimator of the distribution function for the first event time and the Inverse Probability of Censoring Weights for the second time (Lin DY, Sun W, Ying Z (1999) <doi:10.1093/biomet/86.1.59> and (ii) another estimator based on Kaplan-Meier weights (Una-Alvarez J, Meira-Machado L (2008) <https://w3.math.uminho.pt/~lmachado/Biometria_conference.pdf>). The proposed methods are the landmark estimators based on subsampling approach, and the estimator based on weighted cumulative hazard estimator. The package also provides nonparametric estimator conditional to a given continuous covariate. All these methods have been submitted to be published.
This is a heavily modified fork of http://github.com/defunkt/colored gem, with many sensible pull requests combined. Since the authors of the original gem no longer support it, this might, perhaps, be considered a good alternative.
Simple gem that adds various color methods to String class, and can be used as follows:
require 'colored2'
puts 'this is red'.red puts 'this is red with a yellow background'.red.on.yellow puts 'this is red with and italic'.red.italic puts 'this is green bold'.green.bold << ' and regular'.green puts 'this is really bold blue on white but reversed'.bold.blue.on.white.reversed puts 'this is regular, but '.red! << 'this is red '.yellow! << ' and yellow.'.no_color! puts ('this is regular, but '.red! do 'this is red '.yellow! do ' and yellow.'.no_color! end end)
Finds single- and two-arm designs using stochastic curtailment, as described by Law et al. (2022) <doi:10.1080/10543406.2021.2009498> and Law et al. (2021) <doi:10.1002/pst.2067> respectively. Designs can be single-stage or multi-stage. Non-stochastic curtailment is possible as a special case. Desired error-rates, maximum sample size and lower and upper anticipated response rates are inputted and suitable designs are returned with operating characteristics. Stopping boundaries and visualisations are also available. The package can find designs using other approaches, for example designs by Simon (1989) <doi:10.1016/0197-2456(89)90015-9> and Mander and Thompson (2010) <doi:10.1016/j.cct.2010.07.008>. Other features: compare and visualise designs using a weighted sum of expected sample sizes under the null and alternative hypotheses and maximum sample size; visualise any binary outcome design.
Facilitates creation and manipulation of metric graphs, such as street or river networks. Further facilitates operations and visualizations of data on metric graphs, and the creation of a large class of random fields and stochastic partial differential equations on such spaces. These random fields can be used for simulation, prediction and inference. In particular, linear mixed effects models including random field components can be fitted to data based on computationally efficient sparse matrix representations. Interfaces to the R packages INLA and inlabru are also provided, which facilitate working with Bayesian statistical models on metric graphs. The main references for the methods are Bolin, Simas and Wallin (2024) <doi:10.3150/23-BEJ1647>, Bolin, Kovacs, Kumar and Simas (2023) <doi:10.1090/mcom/3929> and Bolin, Simas and Wallin (2023) <doi:10.48550/arXiv.2304.03190>
and <doi:10.48550/arXiv.2304.10372>
.
This package provides a collection of R functions that are widely used by the Petersen Lab. Included are functions for various purposes, including evaluating the accuracy of judgments and predictions, performing scoring of assessments, generating correlation matrices, conversion of data between various types, data management, psychometric evaluation, extensions related to latent variable modeling, various plotting capabilities, and other miscellaneous useful functions. By making the package available, we hope to make our methods reproducible and replicable by others and to help others perform their data processing and analysis methods more easily and efficiently. The codebase is provided in Petersen (2024) <doi:10.5281/zenodo.7602890> and on CRAN: <doi: 10.32614/CRAN.package.petersenlab>. The package is described in "Principles of Psychological Assessment: With Applied Examples in R" (Petersen, 2024, 2025) <doi:10.1201/9781003357421>, <doi:10.25820/work.007199>, <doi:10.5281/zenodo.6466589>.
Random simulations of fuzzy numbers are still a challenging problem. The aim of this package is to provide the respective procedures to simulate fuzzy random variables, especially in the case of the piecewise linear fuzzy numbers (PLFNs, see Coroianua et al. (2013) <doi:10.1016/j.fss.2013.02.005> for the further details). Additionally, the special resampling algorithms known as the epistemic bootstrap are provided (see Grzegorzewski and Romaniuk (2022) <doi:10.34768/amcs-2022-0021>, Grzegorzewski and Romaniuk (2022) <doi:10.1007/978-3-031-08974-9_39>, Romaniuk et al. (2024) <doi:10.32614/RJ-2024-016>) together with the functions to apply statistical tests and estimate various characteristics based on the epistemic bootstrap. The package also includes a real-life data set of epistemic fuzzy triangular numbers. The fuzzy numbers used in this package are consistent with the FuzzyNumbers
package.
This package contains functions for the classification and ranking of top candidate features, reconstruction of networks from adjacency matrices and data frames, analysis of the topology of the network and calculation of centrality measures, and identification of the most influential nodes. Also, a function is provided for running SIRIR model, which is the combination of leave-one-out cross validation technique and the conventional SIR model, on a network to unsupervisedly rank the true influence of vertices. Additionally, some functions have been provided for the assessment of dependence and correlation of two network centrality measures as well as the conditional probability of deviation from their corresponding means in opposite direction. Fred Viole and David Nawrocki (2013, ISBN:1490523995). Csardi G, Nepusz T (2006). "The igraph software package for complex network research." InterJournal
, Complex Systems, 1695. Adopted algorithms and sources are referenced in function document.
Generates the following sequential two-arm experimental designs: (1) completely randomized (Bernoulli) (2) balanced completely randomized (3) Efron's (1971) Biased Coin (4) Atkinson's (1982) Covariate-Adjusted Biased Coin (5) Kapelner and Krieger's (2014) Covariate-Adjusted Matching on the Fly (6) Kapelner and Krieger's (2021) CARA Matching on the Fly with Differential Covariate Weights (Naive) (7) Kapelner and Krieger's (2021) CARA Matching on the Fly with Differential Covariate Weights (Stepwise) and also provides the following types of inference: (1) estimation (with both Z-style estimators and OLS estimators), (2) frequentist testing (via asymptotic distribution results and via employing the nonparametric randomization test) and (3) frequentist confidence intervals (only under the superpopulation sampling assumption currently). Details can be found in our publication: Kapelner and Krieger "A Matching Procedure for Sequential Experiments that Iteratively Learns which Covariates Improve Power" (2020) <arXiv:2010.05980>
.
Seq2pathway is a novel tool for functional gene-set (or termed as pathway) analysis of next-generation sequencing data, consisting of "seq2gene" and "gene2path" components. The seq2gene links sequence-level measurements of genomic regions (including SNPs or point mutation coordinates) to gene-level scores, and the gene2pathway summarizes gene scores to pathway-scores for each sample. The seq2gene has the feasibility to assign both coding and non-exon regions to a broader range of neighboring genes than only the nearest one, thus facilitating the study of functional non-coding regions. The gene2pathway takes into account the quantity of significance for gene members within a pathway compared those outside a pathway. The output of seq2pathway is a general structure of quantitative pathway-level scores, thus allowing one to functional interpret such datasets as RNA-seq, ChIP-seq
, GWAS, and derived from other next generational sequencing experiments.
These Rcpp'-based functions compute the efficient score statistics for grouped time-to-event data (Prentice and Gloeckler, 1978), with the optional inclusion of baseline covariates. Functions for estimating the parameter of interest and nuisance parameters, including baseline hazards, using maximum likelihood are also provided. A parallel set of functions allow for the incorporation of family structure of related individuals (e.g., trios). Note that the current implementation of the frailty model (Ripatti and Palmgren, 2000) is sensitive to departures from model assumptions, and should be considered experimental. For these data, the exact proportional-hazards-model-based likelihood is computed by evaluating multiple variable integration. The integration is accomplished using the Cuba library (Hahn, 2005), and the source files are included in this package. The maximization process is carried out using Brent's algorithm, with the C++ code file from John Burkardt and John Denker (Brent, 2002).
Fast, optimal, and reproducible clustering algorithms for circular, periodic, or framed data. The algorithms introduced here are based on a core algorithm for optimal framed clustering the authors have developed (Debnath & Song 2021) <doi:10.1109/TCBB.2021.3077573>. The runtime of these algorithms is O(K N log^2 N), where K is the number of clusters and N is the number of circular data points. On a desktop computer using a single processor core, millions of data points can be grouped into a few clusters within seconds. One can apply the algorithms to characterize events along circular DNA molecules, circular RNA molecules, and circular genomes of bacteria, chloroplast, and mitochondria. One can also cluster climate data along any given longitude or latitude. Periodic data clustering can be formulated as circular clustering. The algorithms offer a general high-performance solution to circular, periodic, or framed data clustering.