It is very common nowadays for a study to collect multiple features and appropriately integrating multiple longitudinal features simultaneously for defining individual clusters becomes increasingly crucial to understanding population heterogeneity and predicting future outcomes. BCClong implements a Bayesian consensus clustering (BCC) model for multiple longitudinal features via a generalized linear mixed model. Compared to existing packages, several key features make the BCClong package appealing: (a) it allows simultaneous clustering of mixed-type (e.g., continuous, discrete and categorical) longitudinal features, (b) it allows each longitudinal feature to be collected from different sources with measurements taken at distinct sets of time points (known as irregularly sampled longitudinal data), (c) it relaxes the assumption that all features have the same clustering structure by estimating the feature-specific (local) clusterings and consensus (global) clustering.
Standard generalized additive models assume a response function, which induces an assumption on the shape of the distribution of the response. However, miss-specifying the response function results in biased estimates. Therefore in Spiegel et al. (2017) <doi:10.1007/s11222-017-9799-6> we propose to estimate the response function jointly with the covariate effects. This package provides the underlying functions to estimate these generalized additive models with flexible response functions. The estimation is based on an iterative algorithm. In the outer loop the response function is estimated, while in the inner loop the covariate effects are determined. For the response function a strictly monotone P-spline is used while the covariate effects are estimated based on a modified Fisher-Scoring algorithm. Overall the estimation relies on the mgcv'-package.
Several yield stability analyses are mentioned in this package: variation and regression based yield stability analyses. Resampling techniques are integrated with these stability analyses. The function stab.mean()
provides the genotypic means and ranks including their corresponding confidence intervals. The function stab.var()
provides the genotypic variances over environments including their corresponding confidence intervals. The function stab.fw()
is an extended method from the Finlay-Wilkinson method (1963). This method can include several other factors that might impact yield stability. Resampling technique is integrated into this method. A few missing data points or unbalanced data are allowed too. The function stab.fw.check()
is an extended method from the Finlay-Wilkinson method (1963). The yield stability is evaluated via common check line(s). Resampling technique is integrated.
Logistic-normal Multinomial (LNM) models are common in problems with multivariate count data. This package gives a simple implementation with a 30 line Stan script. This lightweight implementation makes it an easy starting point for other projects, in particular for downstream tasks that require analysis of "compositional" data. It can be applied whenever a multinomial probability parameter is thought to depend linearly on inputs in a transformed, log ratio space. Additional utilities make it easy to inspect, create predictions, and draw samples using the fitted models. More about the LNM can be found in Xia et al. (2013) "A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis" <doi:10.1111/biom.12079> and Sankaran and Holmes (2023) "Generative Models: An Interdisciplinary Perspective" <doi:10.1146/annurev-statistics-033121-110134>.
The meta-analysis is performed to increase the statistical power by integrating the results from several experiments. The p-values are often combined in meta-analysis when the effect sizes are not available. The metapro R package provides not only traditional methods (Becker BJ (1994, ISBN:0-87154-226-9), Mosteller, F. & Bush, R.R. (1954, ISBN:0201048523) and Lancaster HO (1949, ISSN:00063444)), but also new method named weighted Fisherâ s method we developed. While the (weighted) Z-method is suitable for finding features effective in most experiments, (weighted) Fisherâ s method is useful for detecting partially associated features. Thus, the users can choose the function based on their purpose. Yoon et al. (2021) "Powerful p-value combination methods to detect incomplete association" <doi:10.1038/s41598-021-86465-y>.
Calculates and depicts probabilistic long-term effects in binary models with temporal dependence variables. The package performs two tasks. First, it calculates the change in the probability of the event occurring given a change in a theoretical variable. Second, it calculates the rolling difference in the future probability of the event for two scenarios: one where the event occurred at a given time and one where the event does not occur. The package is consistent with the recent movement to depict meaningful and easy-to-interpret quantities of interest with the requisite measures of uncertainty. It is the first to make it easy for researchers to interpret short- and long-term effects of explanatory variables in binary autoregressive models, which can have important implications for the correct interpretation of these models.
This package provides a toolbox of fast, native and parallel implementations of various information-based importance criteria estimators and feature selection filters based on them, inspired by the overview by Brown, Pocock, Zhao and Lujan (2012) <https://www.jmlr.org/papers/v13/brown12a.html>. Contains, among other, minimum redundancy maximal relevancy ('mRMR
') method by Peng, Long and Ding (2005) <doi:10.1109/TPAMI.2005.159>; joint mutual information ('JMI') method by Yang and Moody (1999) <https://papers.nips.cc/paper/1779-data-visualization-and-feature-selection-new-algorithms-for-nongaussian-data>; double input symmetrical relevance ('DISR') method by Meyer and Bontempi (2006) <doi:10.1007/11732242_9> as well as joint mutual information maximisation ('JMIM') method by Bennasar, Hicks and Setchi (2015) <doi:10.1016/j.eswa.2015.07.007>.
cn.mops (Copy Number estimation by a Mixture Of PoissonS
) is a data processing pipeline for copy number variations and aberrations (CNVs and CNAs) from next generation sequencing (NGS) data. The package supplies functions to convert BAM files into read count matrices or genomic ranges objects, which are the input objects for cn.mops. cn.mops models the depths of coverage across samples at each genomic position. Therefore, it does not suffer from read count biases along chromosomes. Using a Bayesian approach, cn.mops decomposes read variations across samples into integer copy numbers and noise by its mixture components and Poisson distributions, respectively. cn.mops guarantees a low FDR because wrong detections are indicated by high noise and filtered out. cn.mops is very fast and written in C++.
In p >> n settings, full posterior sampling using existing Markov chain Monte Carlo (MCMC) algorithms is highly inefficient and often not feasible from a practical perspective. To overcome this problem, we propose a scalable stochastic search algorithm that is called the Simplified Shotgun Stochastic Search (S5) and aimed at rapidly explore interesting regions of model space and finding the maximum a posteriori(MAP) model. Also, the S5 provides an approximation of posterior probability of each model (including the marginal inclusion probabilities). This algorithm is a part of an article titled "Scalable Bayesian Variable Selection Using Nonlocal Prior Densities in Ultrahigh-dimensional Settings" (2018) by Minsuk Shin, Anirban Bhattacharya, and Valen E. Johnson and "Nonlocal Functional Priors for Nonparametric Hypothesis Testing and High-dimensional Model Selection" (2020+) by Minsuk Shin and Anirban Bhattacharya.
This package provides a scalable implementation of the highly adaptive lasso algorithm, including routines for constructing sparse matrices of basis functions of the observed data, as well as a custom implementation of Lasso regression tailored to enhance efficiency when the matrix of predictors is composed exclusively of indicator functions. For ease of use and increased flexibility, the Lasso fitting routines invoke code from the glmnet package by default. The highly adaptive lasso was first formulated and described by MJ van der Laan (2017) <doi:10.1515/ijb-2015-0097>, with practical demonstrations of its performance given by Benkeser and van der Laan (2016) <doi:10.1109/DSAA.2016.93>. This implementation of the highly adaptive lasso algorithm was described by Hejazi, Coyle, and van der Laan (2020) <doi:10.21105/joss.02526>.
Researchers often have expectations about the relations between means of different groups or standardized regression coefficients; using informative hypothesis testing to incorporate these expectations into the analysis through order constraints increases statistical power Vanbrabant and Rosseel (2020) <doi:10.4324/9780429273872-14>. Another valuable tool, the Bayes factor, can evaluate evidence for multiple hypotheses without concerns about multiple testing, and can be used in Bayesian updating Hoijtink, Mulder, van Lissa & Gu (2019) <doi:10.1037/met0000201>. The bain R package enables informative hypothesis testing using the Bayes factor. The mmibain package provides shiny web applications based on bain'. The RepliCrisis()
function launches a shiny card game to simulate the evaluation of replication studies while the mmibain()
function launches a shiny application to fit Bayesian informative hypotheses evaluation models from bain'.
This package provides a set of tools that enables efficient estimation of penalized Poisson Pseudo Maximum Likelihood regressions, using lasso or ridge penalties, for models that feature one or more sets of high-dimensional fixed effects. The methodology is based on Breinlich, Corradi, Rocha, Ruta, Santos Silva, and Zylkin (2021) <http://hdl.handle.net/10986/35451> and takes advantage of the method of alternating projections of Gaure (2013) <doi:10.1016/j.csda.2013.03.024> for dealing with HDFE, as well as the coordinate descent algorithm of Friedman, Hastie and Tibshirani (2010) <doi:10.18637/jss.v033.i01> for fitting lasso regressions. The package is also able to carry out cross-validation and to implement the plugin lasso of Belloni, Chernozhukov, Hansen and Kozbur (2016) <doi:10.1080/07350015.2015.1102733>.
Epigenome-wide association studies (EWAS) detects a large number of DNA methylation differences, often hundreds of differentially methylated regions and thousands of CpGs
, that are significantly associated with a disease, many are located in non-coding regions. Therefore, there is a critical need to better understand the functional impact of these CpG
methylations and to further prioritize the significant changes. MethReg
is an R package for integrative modeling of DNA methylation, target gene expression and transcription factor binding sites data, to systematically identify and rank functional CpG
methylations. MethReg
evaluates, prioritizes and annotates CpG
sites with high regulatory potential using matched methylation and gene expression data, along with external TF-target interaction databases based on manually curation, ChIP-seq
experiments or gene regulatory network analysis.
Constrained randomization by Raab and Butcher (2001) <doi:10.1002/1097-0258(20010215)20:3%3C351::AID-SIM797%3E3.0.CO;2-C> is suitable for cluster randomized trials (CRTs) with a small number of clusters (e.g., 20 or fewer). The procedure of constrained randomization is based on the baseline values of some cluster-level covariates specified. The intervention effect on the individual outcome can then be analyzed through clustered permutation test introduced by Gail, et al. (1996) <doi:10.1002/(SICI)1097-0258(19960615)15:11%3C1069::AID-SIM220%3E3.0.CO;2-Q>. Motivated from Li, et al. (2016) <doi:10.1002/sim.7410>, the package performs constrained randomization on the baseline values of cluster-level covariates and clustered permutation test on the individual-level outcomes for cluster randomized trials.
Generates Raven like matrices according to different rules and the response list associated to the matrix. The package can generate matrices composed of 4 or 9 cells, along with a response list of 11 elements (the correct response + 10 incorrect responses). The matrices can be generated according to both logical rules (i.e., the relationships between the elements in the matrix are manipulated to create the matrix) and visual-spatial rules (i.e., the visual or spatial characteristics of the elements are manipulated to generate the matrix). The graphical elements of this package are based on the DescTools
package. This package has been developed within the PRIN2020 Project (Prot. 20209WKCLL) titled "Computerized, Adaptive and Personalized Assessment of Executive Functions and Fluid Intelligence" and founded by the Italian Ministry of Education and Research.
This package provides a set of tools to assist statistical programmers in validating Study Data Tabulation Model (SDTM) domain data sets. Statistical programmers are required to validate that a SDTM data set domain has been programmed correctly, per the SDTM Implementation Guide (SDTMIG) by CDISC (<https://www.cdisc.org/standards/foundational/sdtmig>), study specification, and study protocol using a process called double programming. Double programming involves two different programmers independently converting the raw electronic data cut (EDC) data into a SDTM domain data table and comparing their results to ensure accurate standardization of the data. One of these attempts is termed production and the other validation'. Generally, production runs are the official programs for submittals and these are written in SAS'. Validation runs can be programmed in another language, in this case R'.
When we combine gene-editing technology and sequencing technology, we need to reconstruct a lineage tree from alleles generated and calculate the similarity between each pair of groups. FindIndel()
and IndelForm()
function will help you align each read to reference sequence and generate scar form strings respectively. IndelIdents()
function will help you to define a scar form for each cell or read. IndelPlot()
function will help you to visualize the distribution of deletion and insertion. TagProcess()
function will help you to extract indels for each cell or read. TagDist()
function will help you to calculate the similarity between each pair of groups across the indwells they contain. BuildTree()
function will help you to reconstruct a tree. PlotTree()
function will help you to visualize the tree.
An implementation of the statistical methods commonly used for advanced composite materials in aerospace applications. This package focuses on calculating basis values (lower tolerance bounds) for material strength properties, as well as performing the associated diagnostic tests. This package provides functions for calculating basis values assuming several different distributions, as well as providing functions for non-parametric methods of computing basis values. Functions are also provided for testing the hypothesis that there is no difference between strength and modulus data from an alternate sample and that from a "qualification" or "baseline" sample. For a discussion of these statistical methods and their use, see the Composite Materials Handbook, Volume 1 (2012, ISBN: 978-0-7680-7811-4). Additional details about this package are available in the paper by Kloppenborg (2020, <doi:10.21105/joss.02265>).
Recently many new p-value based multiple test procedures have been proposed, and these new methods are more powerful than the widely used Hochberg procedure. These procedures strongly control the familywise error rate (FWER). This is a comprehensive collection of p-value based FWER-control stepwise multiple test procedures, including six procedure families and thirty multiple test procedures. In this collection, the conservative Hochberg procedure, linear time Hommel procedures, asymptotic Rom procedure, Gou-Tamhane-Xi-Rom procedures, and Quick procedures are all developed in recent five years since 2014. The package name "elitism" is an acronym of "e"quipment for "l"ogarithmic and l"i"near "ti"me "s"tepwise "m"ultiple hypothesis testing. See Gou, J. (2022), "Quick multiple test procedures and p-value adjustments", Statistics in Biopharmaceutical Research 14(4), 636-650.
This package provides functions for constructing Transformed and Relative Lorenz curves with survey sampling weights. Given a variable of interest measured in two groups with scaled survey weights so that their hypothetical populations are of equal size, tlorenz()
computes the proportion of members of the group with smaller values (ordered from smallest to largest) needed for their sum to match the sum of the top qth percentile of the group with higher values. rlorenz()
shows the fraction of the total value of the group with larger values held by the pth percentile of those in the group with smaller values. Fd()
is a survey weighted cumulative distribution function and Eps()
is a survey weighted inverse cdf used in rlorenz()
. Ramos, Graubard, and Gastwirth (2025) <doi:10.1093/jrsssa/qnaf044>.
This package provides a new metric named dependency heaviness is proposed that measures the number of additional dependency packages that a parent package brings to its child package and are unique to the dependency packages imported by all other parents. The dependency heaviness analysis is visualized by a customized heatmap. The package is described in <doi:10.1093/bioinformatics/btac449>. We have also performed the dependency heaviness analysis on the CRAN/Bioconductor package ecosystem and the results are implemented as a web-based database which provides comprehensive tools for querying dependencies of individual R packages. The systematic analysis on the CRAN/Bioconductor ecosystem is described in <doi:10.1016/j.jss.2023.111610>. From pkgndep version 2.0.0, the heaviness database includes snapshots of the CRAN/Bioconductor ecosystems for many old R versions.
This package provides a fundamental problem in biomedical research is the low number of observations, mostly due to a lack of available biosamples, prohibitive costs, or ethical reasons. By augmenting a few real observations with artificially generated samples, their analysis could lead to more robust and higher reproducible. One possible solution to the problem is the use of generative models, which are statistical models of data that attempt to capture the entire probability distribution from the observations. Using the variational autoencoder (VAE), a well-known deep generative model, this package is aimed to generate samples with gene expression data, especially for single-cell RNA-seq data. Furthermore, the VAE can use conditioning to produce specific cell types or subpopulations. The conditional VAE (CVAE) allows us to create targeted samples rather than completely random ones.
In addition to modeling the expectation (location) of an outcome, mixed effects location scale models (MELSMs) include submodels on the variance components (scales) directly. This allows models on the within-group variance with mixed effects, and between-group variances with fixed effects. The MELSM can be used to model volatility, intraindividual variance, uncertainty, measurement error variance, and more. Multivariate MELSMs (MMELSMs) extend the model to include multiple correlated outcomes, and therefore multiple locations and scales. The latent multivariate MELSM (LMMELSM) further includes multiple correlated latent variables as outcomes. This package implements two-level mixed effects location scale models on multiple observed or latent outcomes, and between-group variance modeling. Williams, Martin, Liu, and Rast (2020) <doi:10.1027/1015-5759/a000624>. Hedeker, Mermelstein, and Demirtas (2008) <doi:10.1111/j.1541-0420.2007.00924.x>.
Simulation results detailed in Esarey and Menger (2019) <doi:10.1017/psrm.2017.42> demonstrate that cluster adjusted t statistics (CATs) are an effective method for correcting standard errors in scenarios with a small number of clusters. The mmiCATs
package offers a suite of tools for working with CATs. The mmiCATs()
function initiates a shiny web application, facilitating the analysis of data utilizing CATs, as implemented in the cluster.im.glm()
function from the clusterSEs
package. Additionally, the pwr_func_lmer()
function is designed to simplify the process of conducting simulations to compare mixed effects models with CATs models. For educational purposes, the CloseCATs()
function launches a shiny application card game, aimed at enhancing users understanding of the conditions under which CATs should be preferred over random intercept models.