Implementation of Johansen's general formulation of Welch-James's statistic with Approximate Degrees of Freedom, which makes it suitable for testing any linear hypothesis concerning cell means in univariate and multivariate mixed model designs when the data pose non-normality and non-homogeneous variance. Some improvements, namely trimmed means and Winsorized variances, and bootstrapping for calculating an empirical critical value, have been added to the classical formulation. The code departs from a previous SAS implementation by L.M. Lix and H.J. Keselman, available at <http://supp.apa.org/psycarticles/supplemental/met_13_2_110/SAS_Program.pdf> and published in Keselman, H.J., Wilcox, R.R., and Lix, L.M. (2003) <DOI:10.1111/1469-8986.00060>.
Original ctsem (continuous time structural equation modelling) functionality, based on the OpenMx software, as described in Driver, Oud, Voelkle (2017) <doi:10.18637/jss.v077.i05>, with updated details in vignette. Combines stochastic differential equations representing latent processes with structural equation measurement models. These functions were split off from the main package of ctsem', as the main package uses the rstan package as a backend now -- offering estimation options from max likelihood to Bayesian. There are nevertheless use cases for the wide format SEM style approach as offered here, particularly when there are no individual differences in observation timing and the number of individuals is large. For the main ctsem package, see <https://cran.r-project.org/package=ctsem>.
This package provides methods for testing the equality between groups of estimated density functions. The package implements FDET (Fourier-based Density Equality Testing) and MDET (Moment-based Density Equality Testing), two new approaches introduced by the author. Both methods extend an earlier testing approach by Delicado (2007), "Functional k-sample problem when data are density functions" <doi:10.1007/s00180-007-0047-y>, which is referred to as DET (Density Equality Testing) in this package for clarity. FDET compares groups of densities based on their global shape using Fourier transforms, while MDET tests for differences in distributional moments. All methods are described in Anarat, Krutmann and Schwender (2025), "Testing for Differences in Extrinsic Skin Aging Based on Density Functions" (Submitted).
Constructing niche models and analyzing patterns of niche evolution. Acts as an interface for many popular modeling algorithms, and allows users to conduct Monte Carlo tests to address basic questions in evolutionary ecology and biogeography. Warren, D.L., R.E. Glor, and M. Turelli (2008) <doi:10.1111/j.1558-5646.2008.00482.x> Glor, R.E., and D.L. Warren (2011) <doi:10.1111/j.1558-5646.2010.01177.x> Warren, D.L., R.E. Glor, and M. Turelli (2010) <doi:10.1111/j.1600-0587.2009.06142.x> Cardillo, M., and D.L. Warren (2016) <doi:10.1111/geb.12455> D.L. Warren, L.J. Beaumont, R. Dinnage, and J.B. Baumgartner (2019) <doi:10.1111/ecog.03900>.
The lipid scrambling activity of protein extracts and purified scramblases is often determined using a fluorescence-based assay involving many manual steps. flippant offers an integrated solution for the analysis and publication-grade graphical presentation of dithionite scramblase assays, as well as a platform for review, dissemination and extension of the strategies it employs. The package's name derives from a play on the fact that lipid scrambling is also sometimes referred to as flipping'. The package is originally published as Cotton, R.J., Ploier, B., Goren, M.A., Menon, A.K., and Graumann, J. (2017). "flippantâ An R package for the automated analysis of fluorescence-based scramblase assays." BMC Bioinformatics 18, 146. <DOI:10.1186/s12859-017-1542-y>.
Handles univariate non-parametric density estimation with parametric starts and asymmetric kernels in a simple and flexible way. Kernel density estimation with parametric starts involves fitting a parametric density to the data before making a correction with kernel density estimation, see Hjort & Glad (1995) <doi:10.1214/aos/1176324627>. Asymmetric kernels make kernel density estimation more efficient on bounded intervals such as (0, 1) and the positive half-line. Supported asymmetric kernels are the gamma kernel of Chen (2000) <doi:10.1023/A:1004165218295>, the beta kernel of Chen (1999) <doi:10.1016/S0167-9473(99)00010-9>, and the copula kernel of Jones & Henderson (2007) <doi:10.1093/biomet/asm068>. User-supplied kernels, parametric starts, and bandwidths are supported.
This package provides a utility library to facilitate the generalization of statistical methods built on a regression framework. Package developers can use modelObj methods to initiate a regression analysis without concern for the details of the regression model and the method to be used to obtain parameter estimates. The specifics of the regression step are left to the user to define when calling the function. The user of a function developed within the modelObj framework creates as input a modelObj that contains the model and the R methods to be used to obtain parameter estimates and to obtain predictions. In this way, a user can easily go from linear to non-linear models within the same package.
Recent advances in single cell/nucleus transcriptomic technology has enabled collection of cohort-scale datasets to study cell type specific gene expression differences associated disease state, stimulus, and genetic regulation. The scale of these data, complex study designs, and low read count per cell mean that characterizing cell type specific molecular mechanisms requires a user-frieldly, purpose-build analytical framework. We have developed the dreamlet package that applies a pseudobulk approach and fits a regression model for each gene and cell cluster to test differential expression across individuals associated with a trait of interest. Use of precision-weighted linear mixed models enables accounting for repeated measures study designs, high dimensional batch effects, and varying sequencing depth or observed cells per biosample.
This package implements the algorithm described in Barron, M., Zhang, S. and Li, J. 2017, "A sparse differential clustering algorithm for tracing cell type changes via single-cell RNA-sequencing data", Nucleic Acids Research, gkx1113, <doi:10.1093/nar/gkx1113>. This algorithm clusters samples from two different populations, links the clusters across the conditions and identifies marker genes for these changes. The package was designed for scRNA-Seq data but is also applicable to many other data types, just replace cells with samples and genes with variables. The package also contains functions for estimating the parameters for SparseDC as outlined in the paper. We recommend that users further select their marker genes using the magnitude of the cluster centers.
PAM (Partitioning Around Medoids) algorithm application to samples of single cell sequencing techniques with a high number of cells (as many as the computer memory allows). The package uses a binary format to store matrices (either full, sparse or symmetric) in files written in the disk that can contain any data type (not just double) which allows its manipulation when memory is sufficient to load them as int or float, but not as double. The PAM implementation is done in parallel, using several/all the cores of the machine, if it has them. This package shares a great part of its code with packages jmatrix and parallelpam but their functionality is included here so there is no need to install them.
This is a collection of functions optimized for working with with various kinds of text matrices. Focusing on the text matrix as the primary object - represented either as a base R dense matrix or a Matrix package sparse matrix - allows for a consistent and intuitive interface that stays close to the underlying mathematical foundation of computational text analysis. In particular, the package includes functions for working with word embeddings, text networks, and document-term matrices. Methods developed in Stoltz and Taylor (2019) <doi:10.1007/s42001-019-00048-6>, Taylor and Stoltz (2020) <doi:10.1007/s42001-020-00075-8>, Taylor and Stoltz (2020) <doi:10.15195/v7.a23>, and Stoltz and Taylor (2021) <doi:10.1016/j.poetic.2021.101567>.
This package provides a Bayesian Nonparametric model for the study of time-evolving frequencies, which has become renowned in the study of population genetics. The model consists of a Hidden Markov Model (HMM) in which the latent signal is a distribution-valued stochastic process that takes the form of a finite mixture of Dirichlet Processes, indexed by vectors that count how many times each value is observed in the population. The package implements methodologies presented in Ascolani, Lijoi and Ruggiero (2021) <doi:10.1214/20-BA1206> and Ascolani, Lijoi and Ruggiero (2023) <doi:10.3150/22-BEJ1504> that make it possible to study the process at the time of data collection or to predict its evolution in future or in the past.
Allows the user to generate a list of features (gene, pseudo, RNA, CDS, and/or UTR) directly from NCBI database for any species with a current build available. Option to save downloaded and formatted files is available, and the user can prioritize the feature list based on type and assembly builds present in the current build used. The user can then use the list of features generated or provide a list to map a set of markers (designed for SNP markers with a single base pair position available) to the closest feature based on the map build. This function does require map positions of the markers to be provided and the positions should be based on the build being queried through NCBI.
Construct a principal surface that are two-dimensional surfaces that pass through the middle of a p-dimensional data set. They minimise the distance from the data points, and provide a nonlinear summary of data. The surfaces are nonparametric and their shape is suggested by the data. The formation of a surface is found using an iterative procedure which starts with a linear summary, typically with a principal component plane. Each successive iteration is a local average of the p-dimensional points, where an average is based on a projection of a point onto the nonlinear surface of the previous iteration. For more information on principal surfaces, see Ganey, R. (2019, "https://open.uct.ac.za/items/4e655d7d-d10c-481b-9ccc-801903aebfc8").
The fossil record is a joint expression of ecological, taphonomic, evolutionary, and stratigraphic processes (Holland and Patzkowsky, 2012, ISBN:978-0226649382). This package allowing to simulate biological processes in the time domain (e.g., trait evolution, fossil abundance, phylogenetic trees), and examine how their expression in the rock record (stratigraphic domain) is influenced based on age-depth models, ecological niche models, and taphonomic effects. Functions simulating common processes used in modeling trait evolution, biostratigraphy or event type data such as first/last occurrences are provided and can be used standalone or as part of a pipeline. The package comes with example data sets and tutorials in several vignettes, which can be used as a template to set up one's own simulation.
This package provides tools for building Rescorla-Wagner Models for Two-Alternative Forced Choice tasks, commonly employed in psychological research. Most concepts and ideas within this R package are referenced from Sutton and Barto (2018) <ISBN:9780262039246>. The package allows for the intuitive definition of RL models using simple if-else statements and three basic models built into this R package are referenced from Niv et al. (2012)<doi:10.1523/JNEUROSCI.5498-10.2012>. Our approach to constructing and evaluating these computational models is informed by the guidelines proposed in Wilson & Collins (2019) <doi:10.7554/eLife.49547>. Example datasets included with the package are sourced from the work of Mason et al. (2024) <doi:10.3758/s13423-023-02415-x>.
This package provides some tools for developing and validating prediction models, estimate expected survival of patients and visualize them graphically. Most of the implemented methods are based on penalized regressions such as: the lasso (Tibshirani R (1996)), the elastic net (Zou H et al. (2005) <doi:10.1111/j.1467-9868.2005.00503.x>), the adaptive lasso (Zou H (2006) <doi:10.1198/016214506000000735>), the stability selection (Meinshausen N et al. (2010) <doi:10.1111/j.1467-9868.2010.00740.x>), some extensions of the lasso (Ternes et al. (2016) <doi:10.1002/sim.6927>), some methods for the interaction setting (Ternes N et al. (2016) <doi:10.1002/bimj.201500234>), or others. A function generating simulated survival data set is also provided.
This package provides methods for estimation and hypothesis testing of proportions in group testing designs: methods for estimating a proportion in a single population (assuming sensitivity and specificity equal to 1 in designs with equal group sizes), as well as hypothesis tests and functions for experimental design for this situation. For estimating one proportion or the difference of proportions, a number of confidence interval methods are included, which can deal with various different pool sizes. Further, regression methods are implemented for simple pooling and matrix pooling designs. Methods for identification of positive items in group testing designs: Optimal testing configurations can be found for hierarchical and array-based algorithms. Operating characteristics can be calculated for testing configurations across a wide variety of situations.
Given a likelihood provided by the user, this package applies it to a given matrix dataset in order to find change points in the data that maximize the sum of the likelihoods of all the segments. This package provides a handful of algorithms with different time complexities and assumption compromises so the user is able to choose the best one for the problem at hand. The implementation of the segmentation algorithms in this package are based on the paper by Bruno M. de Castro, Florencia Leonardi (2018) <arXiv:1501.01756>. The Berlin weather sample dataset was provided by Deutscher Wetterdienst <https://dwd.de/>. You can find all the references in the Acknowledgments section of this package's repository via the URL below.
Statistical procedures to perform stability analysis in plant breeding and to identify stable genotypes under diverse environments. It is possible to calculate coefficient of homeostaticity by Khangildin et al. (1979), variance of specific adaptive ability by Kilchevsky&Khotyleva (1989), weighted homeostaticity index by Martynov (1990), steadiness of stability index by Udachin (1990), superiority measure by Lin&Binn (1988) <doi:10.4141/cjps88-018>, regression on environmental index by Erberhart&Rassel (1966) <doi:10.2135/cropsci1966.0011183X000600010011x>, Tai's (1971) stability parameters <doi:10.2135/cropsci1971.0011183X001100020006x>, stability variance by Shukla (1972) <doi:10.1038/hdy.1972.87>, ecovalence by Wricke (1962), nonparametric stability parameters by Nassar&Huehn (1987) <doi:10.2307/2531947>, Francis&Kannenberg's parameters of stability (1978) <doi:10.4141/cjps78-157>.
This package provides a tool for matching ICD-10 codes to corresponding Clinical Classification Software Refined (CCSR) codes. The main function, CCSRfind(), identifies each CCSR code that applies to an individual given their diagnosis codes. It also provides a summary of CCSR codes that are matched to a dataset. The package contains 3 datasets: DXCCSR (mapping of ICD-10 codes to CCSR codes), Legend (conversion of DXCCSR to CCSRfind-usable format for CCSR codes with less than or equal to 1000 ICD-10 diagnosis codes), and LegendExtend (conversion of DXCCSR to CCSRfind-usable format for CCSR codes with more than 1000 ICD-10 dx codes). The disc() function applies grepl() ('base') to multiple columns and is used in CCSRfind().
Estimation of the generalized beta distribution of the second kind (GB2) and related models using grouped data in form of income shares. The GB2 family is a general class of distributions that provides an accurate fit to income data. GB2group includes functions to estimate the GB2, the Singh-Maddala, the Dagum, the Beta 2, the Lognormal and the Fisk distributions. GB2group deploys two different econometric strategies to estimate these parametric distributions, the equally weighted minimum distance (EWMD) estimator and the optimally weighted minimum distance (OMD) estimator. Asymptotic standard errors are reported for the OMD estimates. Standard errors of the EWMD estimates are obtained by Monte Carlo simulation. See Jorda et al. (2018) <arXiv:1808.09831> for a detailed description of the estimation procedure.
Nonparametric methods for landmark prediction of long-term survival outcomes, incorporating covariate and short-term event information. The package supports the construction of flexible varying-coefficient models that use discrete covariates, as well as multiple continuous covariates. The goal is to improve prediction accuracy when censored short-term events are available as predictors, using robust nonparametric procedures that do not require correct model specification and avoid restrictive parametric assumptions found in alternative methods. More information on these methods can be found in Parast et al. 2012 <doi:10.1080/01621459.2012.721281>, Parast et al. 2011 <doi:10.1002/bimj.201000150>, and Parast and Cai 2013 <doi:10.1002/sim.5776>. A tutorial for this package is available here: <https://www.laylaparast.com/landpred>.
Computing statistical hypothesis testing for loading in principal component analysis (PCA) (Yamamoto, H. et al. (2014) <doi:10.1186/1471-2105-15-51>), orthogonal smoothed PCA (OS-PCA) (Yamamoto, H. et al. (2021) <doi:10.3390/metabo11030149>), one-sided kernel PCA (Yamamoto, H. (2023) <doi:10.51094/jxiv.262>), partial least squares (PLS) and PLS discriminant analysis (PLS-DA) (Yamamoto, H. et al. (2009) <doi:10.1016/j.chemolab.2009.05.006>), PLS with rank order of groups (PLS-ROG) (Yamamoto, H. (2017) <doi:10.1002/cem.2883>), regularized canonical correlation analysis discriminant analysis (RCCA-DA) (Yamamoto, H. et al. (2008) <doi:10.1016/j.bej.2007.12.009>), multiset PLS and PLS-ROG (Yamamoto, H. (2022) <doi:10.1101/2022.08.30.505949>).