An implementation of the methodology described in Petersen and Mueller (2016) <doi:10.1214/15-AOS1363> for the functional data analysis of samples of density functions. Densities are first transformed to their corresponding log quantile densities, followed by ordinary Functional Principal Components Analysis (FPCA). Transformation modes of variation yield improved interpretation of the variability in the data as compared to FPCA on the densities themselves. The standard fraction of variance explained (FVE) criterion commonly used for functional data is adapted to the transformation setting, also allowing for an alternative quantification of variability for density data through the Wasserstein metric of optimal transport.
Generates/modifies RNA-seq data for use in simulations. We provide a suite of functions that will add a known amount of signal to a real RNA-seq dataset. The advantage of using this approach over simulating under a theoretical distribution is that common/annoying aspects of the data are more preserved, giving a more realistic evaluation of your method. The main functions are select_counts(), thin_diff(), thin_lib(), thin_gene(), thin_2group(), thin_all(), and effective_cor(). See Gerard (2020) <doi:10.1186/s12859-020-3450-9> for details on the implemented methods.
Likelihood evaluations for stationary Gaussian time series are typically obtained via the Durbin-Levinson algorithm, which scales as O(n^2) in the number of time series observations. This package provides a "superfast" O(n log^2 n) algorithm written in C++, crossing over with Durbin-Levinson around n = 300. Efficient implementations of the score and Hessian functions are also provided, leading to superfast versions of inference algorithms such as Newton-Raphson and Hamiltonian Monte Carlo. The C++ code provides a Toeplitz matrix class packaged as a header-only library, to simplify low-level usage in other packages and outside of R.
An implementation of the representation-dependent gene level operations of grammar-based genetic programming with genes which are derivation trees of a context-free grammar: Initialization of a gene with a complete random derivation tree, decoding of a derivation tree. Crossover is implemented by exchanging subtrees. Depth-bounds for the minimal and the maximal depth of the roots of the subtrees exchanged by crossover can be set. Mutation is implemented by replacing a subtree by a random subtree. The depth of the random subtree and the insertion node are configurable. For details, see Geyer-Schulz (1997, ISBN:978-3-7908-0830-X).
High-throughput experimental data are accumulating exponentially in public databases. However, mining valid scientific discoveries from these abundant resources is hampered by technical artifacts and inherent biological heterogeneity. The former are usually termed "batch effects," and the latter is often modelled by "subtypes." The R package BUScorrect fits a Bayesian hierarchical model, the Batch-effects-correction-with-Unknown-Subtypes model (BUS), to correct batch effects in the presence of unknown subtypes. BUS is capable of (a) correcting batch effects explicitly, (b) grouping samples that share similar characteristics into subtypes, (c) identifying features that distinguish subtypes, and (d) enjoying a linear-order computation complexity.
Belief propagation methods in Bayesian Networks to propagate evidence through the network. The implementation of these methods are based on the article: Cowell, RG (2005). Local Propagation in Conditional Gaussian Bayesian Networks <https://www.jmlr.org/papers/v6/cowell05a.html>. For details please see Yu et. al. (2020) BayesNetBP: An R Package for Probabilistic Reasoning in Bayesian Networks <doi:10.18637/jss.v094.i03>. The optional cyjShiny package for running the Shiny app is available at <https://github.com/cytoscape/cyjShiny>. Please see the example in the documentation of runBayesNetApp function for installing cyjShiny package from GitHub.
Facilitates many of the analyses performed in studies of behavioral economic demand. The package supports commonly-used options for modeling operant demand including (1) data screening proposed by Stein, Koffarnus, Snider, Quisenberry, & Bickel (2015; <doi:10.1037/pha0000020>), (2) fitting models of demand such as linear (Hursh, Raslear, Bauman, & Black, 1989, <doi:10.1007/978-94-009-2470-3_22>), exponential (Hursh & Silberberg, 2008, <doi:10.1037/0033-295X.115.1.186>) and modified exponential (Koffarnus, Franck, Stein, & Bickel, 2015, <doi:10.1037/pha0000045>), and (3) calculating numerous measures relevant to applied behavioral economists (Intensity, Pmax, Omax). Also supports plotting and comparing data.
Process raw force-plate data (txt-files) by segmenting them into trials and, if needed, calculating (user-defined) descriptive statistics of variables for user-defined time bins (relative to trigger onsets) for each trial. When segmenting the data a baseline correction, a filter, and a data imputation can be applied if needed. Experimental data can also be processed and combined with the segmented force-plate data. This procedure is suggested by Johannsen et al. (2023) <doi:10.6084/m9.figshare.22190155> and some of the options (e.g., choice of low-pass filter) are also suggested by Winter (2009) <doi:10.1002/9780470549148>.
This package provides plotting functions for visualizing pedigrees and family trees. The package complements a behavior genetics package BGmisc [Garrison et al. (2024) <doi:10.21105/joss.06203>] by rendering pedigrees using the ggplot2 framework. Features include support for duplicated individuals, complex mating structures, integration with simulated pedigrees, and layout customization. Due to the impending deprecation of kinship2, version 1.0 incorporates the layout helper functions from kinship2. The pedigree alignment algorithms are adapted from kinship2 [Sinnwell et al. (2014) <doi:10.1159/000363105>]. We gratefully acknowledge the original authors: Jason Sinnwell, Terry Therneau, Daniel Schaid, and Elizabeth Atkinson for their foundational work.
Quantification is a prominent machine learning task that has received an increasing amount of attention in the last years. The objective is to predict the class distribution of a data sample. This package is a collection of machine learning algorithms for class distribution estimation. This package include algorithms from different paradigms of quantification. These methods are described in the paper: A. Maletzke, W. Hassan, D. dos Reis, and G. Batista. The importance of the test set size in quantification assessment. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI20, pages 2640â 2646, 2020. <doi:10.24963/ijcai.2020/366>.
Implement multiverse style analyses (Steegen S., Tuerlinckx F, Gelman A., Vanpaemal, W., 2016) <doi:10.1177/1745691616658637> to show the robustness of statistical inference. Multiverse analysis is a philosophy of statistical reporting where paper authors report the outcomes of many different statistical analyses in order to show how fragile or robust their findings are. The multiverse package (Sarma A., Kale A., Moon M., Taback N., Chevalier F., Hullman J., Kay M., 2021) <doi:10.31219/osf.io/yfbwm> allows users to concisely and flexibly implement multiverse-style analysis, which involve declaring alternate ways of performing an analysis step, in R and R Notebooks.
This package implements propensity score weighting methods for estimating counterfactual survival functions, marginal hazard ratios, and weighted Kaplan-Meier and cumulative risk curves in observational studies with time-to-event outcomes. Supports binary and multiple treatment groups with inverse probability of treatment weighting (IPW), overlap weighting (OW), and average treatment effect on the treated (ATT). Includes symmetric trimming (Crump extension) for extreme propensity scores. Variance estimation via analytical M-estimation or bootstrap. Methods based on Li et al. (2018) <doi:10.1080/01621459.2016.1260466>, Li & Li (2019) <doi:10.1214/19-AOAS1282>, and Cheng et al. (2022) <doi:10.1093/aje/kwac043>.
Representation-dependent gene level operations of a genetic algorithm with binary coded genes: Initialization of random binary genes, several gene maps for binary genes, several mutation operators, several crossover operators with 1 and 2 kids, replication pipelines for 1 and 2 kids, and, last but not least, function factories for configuration. See Goldberg, D. E. (1989, ISBN:0-201-15767-5). For crossover operators, see Syswerda, G. (1989, ISBN:1-55860-066-3), Spears, W. and De Jong, K. (1991, ISBN:1-55860-208-9). For mutation operators, see Stanhope, S. A. and Daida, J. M. (1996, ISBN:0-18-201-031-7).
Representing nucleotide modifications in a nucleotide sequence is usually done via special characters from a number of sources. This represents a challenge to work with in R and the Biostrings package. The Modstrings package implements this functionality for RNA and DNA sequences containing modified nucleotides by translating the character internally in order to work with the infrastructure of the Biostrings package. For this the ModRNAString and ModDNAString classes and derivates and functions to construct and modify these objects despite the encoding issues are implemenented. In addition the conversion from sequences to list like location information (and the reverse operation) is implemented as well.
This package provides methods for mediation analysis with missing data and non-normal data are implemented. For missing data, four methods are available: Listwise deletion, Pairwise deletion, Multiple imputation, and Two Stage Maximum Likelihood algorithm. For MI and TS-ML, auxiliary variables can be included to handle missing data. For handling non-normal data, bootstrap and two-stage robust methods can be used. Technical details of the methods can be found in Zhang and Wang (2013, <doi:10.1007/s11336-012-9301-5>), Zhang (2014, <doi:10.3758/s13428-013-0424-0>), and Yuan and Zhang (2012, <doi:10.1007/s11336-012-9282-4>).
Implementation of different statistical tools for the description and analysis of gene expression data based on the concept of data depth, namely, the scale curves for visualizing the dispersion of one or various groups of samples (e.g. types of tumors), a rank test to decide whether two groups of samples come from a single distribution and two methods of supervised classification techniques, the DS and TAD methods. All these techniques are based on the Modified Band Depth, which is a recent notion of depth with a low computational cost, what renders it very appropriate for high dimensional data such as gene expression data.
Three functional modules, including genetic features, differential expression analysis and non-additive expression analysis were integrated into the package. And the package is suitable for RNA-seq and small RNA sequencing data. Besides, two methods of non-additive expression analysis were provided. One is the calculation of the additive (a) and dominant (d), the other is the evaluation of expression level dominance by comparing the total expression of the gene in hybrid offspring with the expression level in parents. For non-additive expression analysis of RNA-seq data, it is only applicable to hybrid offspring (including two sub-genomes) species for the time being.
Utilizing a combination of machine learning models (Random Forest, Naive Bayes, K-Nearest Neighbor, Support Vector Machines, Extreme Gradient Boosting, and Linear Discriminant Analysis) and a deep Artificial Neural Network model, MBMethPred can predict medulloblastoma subgroups, including wingless (WNT), sonic hedgehog (SHH), Group 3, and Group 4 from DNA methylation beta values. See Sharif Rahmani E, Lawarde A, Lingasamy P, Moreno SV, Salumets A and Modhukur V (2023), MBMethPred: a computational framework for the accurate classification of childhood medulloblastoma subgroups using data integration and AI-based approaches. Front. Genet. 14:1233657. <doi: 10.3389/fgene.2023.1233657> for more details.
This package provides tools for econometric analysis and economic modelling with the traditional two-input Constant Elasticity of Substitution (CES) function and with nested CES functions with three and four inputs. The econometric estimation can be done by the Kmenta approximation, or non-linear least-squares using various gradient-based or global optimisation algorithms. Some of these algorithms can constrain the parameters to certain ranges, e.g. economically meaningful values. Furthermore, the non-linear least-squares estimation can be combined with a grid-search for the rho-parameter(s). The estimation methods are described in Henningsen et al. (2021) <doi:10.4337/9781788976480.00030>.
This package implements the SparseStep model for solving regression problems with a sparsity constraint on the parameters. The SparseStep regression model was proposed in Van den Burg, Groenen, and Alfons (2017) <arXiv:1701.06967>. In the model, a regularization term is added to the regression problem which approximates the counting norm of the parameters. By iteratively improving the approximation a sparse solution to the regression problem can be obtained. In this package both the standard SparseStep algorithm is implemented as well as a path algorithm which uses golden section search to determine solutions with different values for the regularization parameter.
Sequential Monte Carlo (SMC) algorithms for fitting a generalised additive mixed model (GAMM) to surface-enhanced resonance Raman spectroscopy (SERRS), using the method of Moores et al. (2016) <arXiv:1604.07299>. Multivariate observations of SERRS are highly collinear and lend themselves to a reduced-rank representation. The GAMM separates the SERRS signal into three components: a sequence of Lorentzian, Gaussian, or pseudo-Voigt peaks; a smoothly-varying baseline; and additive white noise. The parameters of each component of the model are estimated iteratively using SMC. The posterior distributions of the parameters given the observed spectra are represented as a population of weighted particles.
Stochastic blockmodeling of one-mode and linked networks as presented in Škulj and Žiberna (2022) <doi:10.1016/j.socnet.2022.02.001>. The optimization is done via CEM (Classification Expectation Maximization) algorithm that can be initialized by random partitions or the results of k-means algorithm. The development of this package is financially supported by the Slovenian Research Agency (<https://www.arrs.si/>) within the research programs P5-0168 and the research projects J7-8279 (Blockmodeling multilevel and temporal networks) and J5-2557 (Comparison and evaluation of different approaches to blockmodeling dynamic networks by simulations with application to Slovenian co-authorship networks).
This package implements an automated binning of numeric variables and factors with respect to a dichotomous target variable. Two approaches are provided: An implementation of fine and coarse classing that merges granular classes and levels step by step. And a tree-like approach that iteratively segments the initial bins via binary splits. Both procedures merge, respectively split, bins based on similar weight of evidence (WOE) values and stop via an information value (IV) based criteria. The package can be used with single variables or an entire data frame. It provides flexible tools for exploring different binning solutions and for deploying them to (new) data.
Several functions are available for calculating the most widely used effect sizes (ES), along with their variances, confidence intervals and p-values. The output includes ES's of d (mean difference), g (unbiased estimate of d), r (correlation coefficient), z (Fisher's z), and OR (odds ratio and log odds ratio). In addition, NNT (number needed to treat), U3, CLES (Common Language Effect Size) and Cliff's Delta are computed. This package uses recommended formulas as described in The Handbook of Research Synthesis and Meta-Analysis (Cooper, Hedges, & Valentine, 2009). A free web application is available at <https://acdelre.github.io/apps/compute_es/>.